Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
2011-01-01
either the CTA group (n 12) or the control group (n 14). The CTA group learned the open cricothyrotomy procedure using the CTA curriculum. The...completed a 6-item pretest that posed open - ended questions regarding actions and decisions required to conduct the procedure given a specific... posttest assessing their knowl- edge of the procedure. Parallel forms of the pretest and post- test instruments were developed using different case scenar
Kokki, H; Salonvaara, M; Herrgård, E; Onen, P
1999-01-01
Many reports have shown a low incidence of postdural puncture headache (PDPH) and other complaints in young children. The objective of this open-randomized, prospective, parallel group study was to compare the use of a cutting point spinal needle (22-G Quincke) with a pencil point spinal needle (22-G Whitacre) in children. We studied the puncture characteristics, success rate and incidence of postpuncture complaints in 57 children, aged 8 months to 15 years, following 98 lumbar punctures (LP). The patient/parents completed a diary at 3 and 7 days after LP. The response rate was 97%. The incidence of PDPH was similar, 15% in the Quincke group and 9% in the Whitacre group (P=0.42). The risk of developing a PDPH was not dependent on the age (r < 0.00, P=0.67). Eight of the 11 PDPHs developed in children younger than 10 years, the youngest being 23-months-old.
ERIC Educational Resources Information Center
Hesselmark, Eva; Plenty, Stephanie; Bejerot, Susanne
2014-01-01
Although adults with autism spectrum disorder are an increasingly identified patient population, few treatment options are available. This "preliminary" randomized controlled open trial with a parallel design developed two group interventions for adults with autism spectrum disorders and intelligence within the normal range: cognitive…
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang
2017-10-01
The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
NASA Astrophysics Data System (ADS)
Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav
2017-10-01
In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
NASA Astrophysics Data System (ADS)
Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei
We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Effective Vectorization with OpenMP 4.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
Reconfigurable Model Execution in the OpenMDAO Framework
NASA Technical Reports Server (NTRS)
Hwang, John T.
2017-01-01
NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Kordasiewicz, Bartłomiej; Kicinski, Maciej; Małachowski, Konrad; Wieczorek, Janusz; Chaberek, Sławomir; Pomianowski, Stanisław
2018-05-01
The aim of this study was to evaluate and to compare the radiological parameters after arthroscopic and open Latarjet technique via evaluation of computed tomography (CT) scans. Our hypothesis was that the radiological results after arthroscopic stabilisation remained in the proximity of those results achieved after open stabilisation. CT scan evaluation results of patients after primary Latarjet procedure were analysed. Patients operated on between 2006 and 2011 using an open technique composed the OPEN group and patients operated on arthroscopically between 2011 and 2013 composed the ARTHRO group. Forty-three out of 55 shoulders (78.2%) in OPEN and 62 out of 64 shoulders (95.3%) in ARTHRO were available for CT scan evaluation. The average age at surgery was 28 years in OPEN and 26 years in ARTHRO. The mean follow-up was 54.2 months in OPEN and 23.4 months in ARTHRO. CT scan evaluation was used to assess graft fusion and osteolysis. Bone block position and screw orientation were assessed in the axial and the sagittal views. The subscapularis muscle fatty infiltration was evaluated according to Goutallier classification. The non-union rate was significantly higher in OPEN than in ARTHRO: 5 (11.9%) versus 1 (1.7%) (p < 0.05). The total graft osteolysis was significantly higher in the OPEN group: five cases (11.9%) versus zero in ARTHRO (p < 0.05). Graft fracture incidence was comparable in both groups: in two patients in ARTHRO (3.3%) and one case (2.4%) in the OPEN group (p > 0.05). These results should be evaluated very carefully due to significant difference in the follow-up of both groups. A significantly higher rate of partial graft osteolysis at the level of the superior screw was reported in ARTHRO with 32 patients (53.3%) versus 10 (23.8%) in OPEN (p < 0.05). In the axial view, 78.4% of patients in ARTHRO and 80.5% in OPEN had the coracoid bone block in an acceptable position (between 4 mm medially and 2 mm laterally). In the sagittal plane, the bone block was in an acceptable position between 2 and 5 o'clock in 86.7% of patients in ARTHRO and 90.2% in OPEN (p > 0.05). However, in the position between 3 and 5 o'clock there were 56.7% of the grafts in ARTHRO versus 87.8% in OPEN (p < 0.05). The screws were more parallel to the glenoid surface in ARTHRO-the angles were 12.3° for the inferior screw and 12.6° for the superior one. These angles in the OPEN group were respectively 15° and 17° (p < 0.05 and for the superior screw). There was no significant difference in the presence of fatty infiltration of the subscapularis muscle. Arthroscopic Latarjet stabilisation showed satisfactory radiographic results, comparable to the open procedure, however the short-term follow-up can bias this evaluation. Graft healing rate was very high in the arthroscopic technique, but yet osteolysis of the superior part of the graft and more superior graft position in the sagittal view were significantly different when compared to the open technique. The screw position was slightly more parallel to the glenoid via the arthroscopic technique. We recommend both further investigation and development of the arthroscopic technique. III.
Reference datasets for bioequivalence trials in a two-group parallel design.
Fuglsang, Anders; Schütz, Helmut; Labes, Detlew
2015-03-01
In order to help companies qualify and validate the software used to evaluate bioequivalence trials with two parallel treatment groups, this work aims to define datasets with known results. This paper puts a total 11 datasets into the public domain along with proposed consensus obtained via evaluations from six different software packages (R, SAS, WinNonlin, OpenOffice Calc, Kinetica, EquivTest). Insofar as possible, datasets were evaluated with and without the assumption of equal variances for the construction of a 90% confidence interval. Not all software packages provide functionality for the assumption of unequal variances (EquivTest, Kinetica), and not all packages can handle datasets with more than 1000 subjects per group (WinNonlin). Where results could be obtained across all packages, one showed questionable results when datasets contained unequal group sizes (Kinetica). A proposal is made for the results that should be used as validation targets.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
ERIC Educational Resources Information Center
Cream, Angela; O'Brian, Sue; Jones, Mark; Block, Susan; Harrison, Elisabeth; Lincoln, Michelle; Hewat, Sally; Packman, Ann; Menzies, Ross; Onslow, Mark
2010-01-01
Purpose: In this study, the authors investigated the efficacy of video self-modeling (VSM) following speech restructuring treatment to improve the maintenance of treatment effects. Method: The design was an open-plan, parallel-group, randomized controlled trial. Participants were 89 adults and adolescents who undertook intensive speech…
Nipanikar, Sanjay U; Gajare, Kamalakar V; Vaidya, Vidyadhar G; Kamthe, Amol B; Upasani, Sachin A; Kumbhar, Vidyadhar S
2017-01-01
The main objective of the present study was to assess efficacy and safety of AHPL/AYTOP/0113 cream, a polyherbal formulation in comparison with Framycetin sulphate cream in acute wounds. It was an open label, randomized, comparative, parallel group and multi-center clinical study. Total 47 subjects were randomly assigned to Group-A (AHPL/AYTOP/0113 cream) and 42 subjects were randomly assigned to Group-B (Framycetin sulphate cream). All the subjects were advised to apply study drug, thrice daily for 21 days or up to complete wound healing (whichever was earlier). All the subjects were called for follow up on days 2, 4, 7, 10, 14, 17 and 21 or up to the day of complete wound healing. Data describing quantitative measures are expressed as mean ± SD. Comparison of variables representing categorical data was performed using Chi-square test. Group-A subjects took significantly less ( P < 0.05) i.e., (mean) 7.77 days than (mean) 9.87 days of Group-B subjects for wound healing. At the end of the study, statistically significant better ( P < 0.05) results were observed in Group-A than Group-B in mean wound surface area, wound healing parameters and pain associated with wound. Excellent overall efficacy and tolerability was observed in subjects of both the groups. No adverse event or adverse drug reaction was noted in any subject of both the groups. AHPL/AYTOP/0113 cream proved to be superior to Framycetin sulphate cream in healing of acute wounds.
Accuracy of impressions with different impression materials in angulated implants.
Reddy, S; Prasad, K; Vakil, H; Jain, A; Chowdhary, R
2013-01-01
To evaluate the dimensional accuracy of the resultant (duplicative) casts made from two different impression materials (polyvinyl siloxane and polyether) in parallel and angulated implants. Three definitive master casts (control groups) were fabricated in dental stone with three implants, placed at equi-distance. In first group (control), all three implants were placed parallel to each other and perpendicular to the plane of the cast. In the second and third group (control), all three implants were placed at 10° and 15 o angulation respectively to the long axis of the cast, tilting towards the centre. Impressions were made with polyvinyl siloxane and polyether impression materials in a special tray, using a open tray impression technique from the master casts. These impressions were poured to obtain test casts. Three reference distances were evaluated on each test cast by using a profile projector and compared with control groups to determine the effect of combined interaction of implant angulation and impression materials on the accuracy of implant resultant cast. Statistical analysis revealed no significant difference in dimensional accuracy of the resultant casts made from two different impression materials (polyvinyl siloxane and polyether) by closed tray impression technique in parallel and angulated implants. On the basis of the results of this study, the use of both the impression materials i.e., polyether and polyvinyl siloxane impression is recommended for impression making in parallel as well as angulated implants.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...
2013-01-01
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Mu, Juwei; Gao, Shugeng; Mao, Yousheng; Xue, Qi; Yuan, Zuyang; Li, Ning; Su, Kai; Yang, Kun; Lv, Fang; Qiu, Bin; Liu, Deruo; Chen, Keneng; Li, Hui; Yan, Tiansheng; Han, Yongtao; Du, Ming; Xu, Rongyu; Wen, Zhaoke; Wang, Wenxiang; Shi, Mingxin; Xu, Quan; Xu, Shun; He, Jie
2015-11-17
Oesophageal cancer is the eighth most common cause of cancer worldwide. In 2009 in China, the incidence and death rate of oesophageal cancer was 22.14 per 100 000 person-years and 16.77 per 100 000 person-years, respectively, the highest in the world. Minimally invasive oesophagectomy (MIO) was introduced into clinical practice with the aim of reducing the morbidity rate. The mechanisms of MIO may lie in minimising the reaction to surgical injury and inflammation. There are some randomised trials regarding minimally invasive versus open oesophagectomy, with 100-850 subjects enrolled. To date, no large randomised controlled trial comparing minimally invasive versus open oesophagectomy has been reported in China, where squamous cell carcinoma predominated over adenocarcinoma of the oesophagus. This is a 3 year multicentre, prospective, randomised, open and parallel controlled trial, which aims to compare the effectiveness of minimally invasive thoraco-laparoscopic oesophagectomy to open three-stage transthoracic oesophagectomy for resectable oesophageal cancer. Group A patients receive MIO which involves thoracoscopic oesophagectomy and laparoscopic gastric mobilisation with cervical anastomosis. Group B patients receive the open three-stage transthoracic oesophagectomy which involves a right thoracotomy and laparotomy with cervical anastomosis. Primary endpoints include respiratory complications within 30 days after operation. The secondary endpoints include other postoperative complications, influences on pulmonary function, intraoperative data including blood loss, operative time, the number and location of lymph nodes dissected, and mortality in hospital, the length of hospital stay, total expenses in hospital, mortality within 30 days, survival rate after 2 years, postoperative pain, and health-related quality of life (HRQoL). Three hundred and twenty-four patients in each group will be needed and a total of 648 patients will finally be enrolled into the study. The study protocol has been approved by the Institutional Ethics Committees of all participating institutions. The findings of this trial will be disseminated to patients and through peer-reviewed publications and international presentations. NCT02355249. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.
Stone, John E; Gohara, David; Shi, Guochun
2010-05-01
We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Multilevel Parallelization of AutoDock 4.2.
Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P
2011-04-28
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
The Research of the Parallel Computing Development from the Angle of Cloud Computing
NASA Astrophysics Data System (ADS)
Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun
2017-10-01
Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
Stone, John E.; Gohara, David; Shi, Guochun
2010-01-01
We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Tay, Lee; Leon, Francisco; Vratsanos, George; Raymond, Ralph; Corbo, Michael
2007-01-01
The effect of abatacept, a selective T-cell co-stimulation modulator, on vaccination has not been previously investigated. In this open-label, single-dose, randomized, parallel-group, controlled study, the effect of a single 750 mg infusion of abatacept on the antibody response to the intramuscular tetanus toxoid vaccine (primarily a memory response to a T-cell-dependent peptide antigen) and the intramuscular 23-valent pneumococcal vaccine (a less T-cell-dependent response to a polysaccharide antigen) was measured in 80 normal healthy volunteers. Subjects were uniformly randomized to receive one of four treatments: Group A (control group), subjects received vaccines on day 1 only; Group B, subjects received vaccines 2 weeks before abatacept; Group C, subjects received vaccines 2 weeks after abatacept; and Group D, subjects received vaccines 8 weeks after abatacept. Anti-tetanus and anti-pneumococcal (Danish serotypes 2, 6B, 8, 9V, 14, 19F and 23F) antibody titers were measured 14 and 28 days after vaccination. While there were no statistically significant differences between the dosing groups, geometric mean titers following tetanus or pneumococcal vaccination were generally lower in subjects who were vaccinated 2 weeks after receiving abatacept, compared with control subjects. A positive response (defined as a twofold increase in antibody titer from baseline) to tetanus vaccination at 28 days was seen, however, in ≥ 60% of subjects across all treatment groups versus 75% of control subjects. Similarly, over 70% of abatacept-treated subjects versus all control subjects (100%) responded to at least three pneumococcal serotypes, and approximately 25–30% of abatacept-treated subjects versus 45% of control subjects responded to at least six serotypes. PMID:17425783
Parallel Execution of Functional Mock-up Units in Buildings Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan
2016-06-30
A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less
The openGL visualization of the 2D parallel FDTD algorithm
NASA Astrophysics Data System (ADS)
Walendziuk, Wojciech
2005-02-01
This paper presents a way of visualization of a two-dimensional version of a parallel algorithm of the FDTD method. The visualization module was created on the basis of the OpenGL graphic standard with the use of the GLUT interface. In addition, the work includes the results of the efficiency of the parallel algorithm in the form of speedup charts.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland
2003-01-01
In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Open ear hearing aids in tinnitus therapy: An efficacy comparison with sound generators.
Parazzini, Marta; Del Bo, Luca; Jastreboff, Margaret; Tognola, Gabriella; Ravazzani, Paolo
2011-08-01
This study aimed to compare the effectiveness of tinnitus retraining therapy (TRT) with sound generators or with open ear hearing aids in the rehabilitation of tinnitus for a group of subjects who, according to Jastreboff categories, can be treated with both approaches to sound therapy (borderline of Category 1 and 2). This study was a prospective data collection with a parallel-group design which entailed that each subject was randomly assigned to one of the two treatments group: half of the subjects were fitted binaurally with sound generators, and the other half with open ear hearing aids. Both groups received the same educational counselling sessions. Ninety-one subjects passed the screening criteria and were enrolled into the study. Structured interviews, with a variety of measures evaluated through the use of visual-analog scales and the tinnitus handicap inventory self-administered questionnaire, were performed before the therapy and at 3, 6, and 12 months during the therapy. Data showed a highly significant improvement in both tinnitus treatments starting from the first three months and up to one year of therapy, with a progressive and statistically significant decrease in the disability every three months. TRT was equally effective with sound generator or open ear hearing aids: they gave basically identical, statistically indistinguishable results.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.
Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Chang, Kuo-Tsai
2007-01-01
This paper investigates electrical transient characteristics of a Rosen-type piezoelectric transformer (PT), including maximum voltages, time constants, energy losses and average powers, and their improvements immediately after turning OFF. A parallel resistor connected to both input terminals of the PT is needed to improve the transient characteristics. An equivalent circuit for the PT is first given. Then, an open-circuit voltage, involving a direct current (DC) component and an alternating current (AC) component, and its related energy losses are derived from the equivalent circuit with initial conditions. Moreover, an AC power control system, including a DC-to-AC resonant inverter, a control switch and electronic instruments, is constructed to determine the electrical characteristics of the OFF transient state. Furthermore, the effects of the parallel resistor on the transient characteristics at different parallel resistances are measured. The advantages of adding the parallel resistor also are discussed. From the measured results, the DC time constant is greatly decreased from 9 to 0.04 ms by a 10 k(omega) parallel resistance under open output.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.
Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B
2011-08-15
Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Parallel NGO networks for HIV control: risks and opportunities for NGO contracting.
Zaidi, Shehla; Gul, Xaher; Nishtar, Noureen Aleem
2012-12-27
Policy measures for preventive and promotive services are increasingly reliant on contracting of NGOs. Contracting is a neo-liberal response relying on open market competition for service delivery tenders. In contracting of health services a common assumption is a monolithic NGO market. A case study of HIV control in Pakistan shows that in reality the NGO market comprises of parallel NGO networks having widely different service packages, approaches and agendas. These parallel networks had evolved over time due to vertical policy agendas. Contracting of NGOs for provision of HIV services was faced with uneven capacities and turf rivalries across both NGO networks. At the same time contracting helped NGO providers belonging to different clusters to move towards standardized service delivery for HIV prevention. Market based measures such as contracting need to be accompanied with wider policy measures that facilitate in bringing NGOs groups to a shared understanding of health issues and responses.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Research of influence of open-winding faults on properties of brushless permanent magnets motor
NASA Astrophysics Data System (ADS)
Bogusz, Piotr; Korkosz, Mariusz; Powrózek, Adam; Prokop, Jan; Wygonik, Piotr
2017-12-01
The paper presents an analysis of influence of selected fault states on properties of brushless DC motor with permanent magnets. The subject of study was a BLDC motor designed by the authors for unmanned aerial vehicle hybrid drive. Four parallel branches per each phase were provided in the discussed 3-phase motor. After open-winding fault in single or few parallel branches, a further operation of the motor can be continued. Waveforms of currents, voltages and electromagnetic torque were determined in discussed fault states based on the developed mathematical and simulation models. Laboratory test results concerning an influence of open-windings faults in parallel branches on properties of BLDC motor were presented.
RANS Simulations using OpenFOAM Software
2016-01-01
Averaged Navier- Stokes (RANS) simulations is described and illustrated by applying the simpleFoam solver to two case studies; two dimensional flow...to run in parallel over large processor arrays. The purpose of this report is to illustrate and test the use of the steady-state Reynolds Averaged ...Group in the Maritime Platforms Division he has been simulating fluid flow around ships and submarines using finite element codes, Lagrangian vortex
Biomechanical Comparison of Parallel and Crossed Suture Repair for Longitudinal Meniscus Tears.
Milchteim, Charles; Branch, Eric A; Maughon, Ty; Hughey, Jay; Anz, Adam W
2016-04-01
Longitudinal meniscus tears are commonly encountered in clinical practice. Meniscus repair devices have been previously tested and presented; however, prior studies have not evaluated repair construct designs head to head. This study compared a new-generation meniscus repair device, SpeedCinch, with a similar established device, Fast-Fix 360, and a parallel repair construct to a crossed construct. Both devices utilize self-adjusting No. 2-0 ultra-high molecular weight polyethylene (UHMWPE) and 2 polyether ether ketone (PEEK) anchors. Crossed suture repair constructs have higher failure loads and stiffness compared with simple parallel constructs. The newer repair device would exhibit similar performance to an established device. Controlled laboratory study. Sutures were placed in an open fashion into the body and posterior horn regions of the medial and lateral menisci in 16 cadaveric knees. Evaluation of 2 repair devices and 2 repair constructs created 4 groups: 2 parallel vertical sutures created with the Fast-Fix 360 (2PFF), 2 crossed vertical sutures created with the Fast-Fix 360 (2XFF), 2 parallel vertical sutures created with the SpeedCinch (2PSC), and 2 crossed vertical sutures created with the SpeedCinch (2XSC). After open placement of the repair construct, each meniscus was explanted and tested to failure on a uniaxial material testing machine. All data were checked for normality of distribution, and 1-way analysis of variance by ranks was chosen to evaluate for statistical significance of maximum failure load and stiffness between groups. Statistical significance was defined as P < .05. The mean maximum failure loads ± 95% CI (range) were 89.6 ± 16.3 N (125.7-47.8 N) (2PFF), 72.1 ± 11.7 N (103.4-47.6 N) (2XFF), 71.9 ± 15.5 N (109.4-41.3 N) (2PSC), and 79.5 ± 25.4 N (119.1-30.9 N) (2XSC). Interconstruct comparison revealed no statistical difference between all 4 constructs regarding maximum failure loads (P = .49). Stiffness values were also similar, with no statistical difference on comparison (P = .28). Both devices in the current study had similar failure load and stiffness when 2 vertical or 2 crossed sutures were tested in cadaveric human menisci. Simple parallel vertical sutures perform similarly to crossed suture patterns at the time of implantation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Implementation of highly parallel and large scale GW calculations within the OpenAtom software
NASA Astrophysics Data System (ADS)
Ismail-Beigi, Sohrab
The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
NASA Astrophysics Data System (ADS)
Sun, Rui; Xiao, Heng
2016-04-01
With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.
Lähteenmäki, Pekka; Haukkamaa, Maija; Puolakka, Jukka; Riikonen, Ulla; Sainio, Susanna; Suvisaari, Janne; Nilsson, Carl Gustaf
1998-01-01
Objectives: To assess whether the levonorgestrel intrauterine system could provide a conservative alternative to hysterectomy in the treatment of excessive uterine bleeding. Design: Open randomised multicentre study with two parallel groups: a levonorgestrel intrauterine system group and a control group. Setting: Gynaecology departments of three hospitals in Finland. Subjects: Fifty six women aged 33-49 years scheduled to undergo hysterectomy for treatment of excessive uterine bleeding. Interventions: Women were randomised either to continue with their current medical treatment or to have a levonorgestrel intrauterine system inserted. Main outcome measure: Proportion of women cancelling their decision to undergo hysterectomy. Results: At 6 months, 64.3% (95% confidence interval 44.1 to 81.4%) of the women in the levonorgestrel intrauterine system group and 14.3% (4.0 to 32.7%) in the control group had cancelled their decision to undergo hysterectomy (P<0.001). Conclusions: The use of the levonorgestrel intrauterine system is a good conservative alternative to hysterectomy in the treatment of menorrhagia and should be considered before hysterectomy or other invasive treatments. PMID:9552948
Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP
NASA Technical Reports Server (NTRS)
Norton, C.; Balsara, D.
1999-01-01
Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP
NASA Astrophysics Data System (ADS)
Doerner, Edgardo
2016-05-01
In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
Demi, Libertario; Viti, Jacopo; Kusters, Lieneke; Guidi, Francesco; Tortoli, Piero; Mischi, Massimo
2013-11-01
The speed of sound in the human body limits the achievable data acquisition rate of pulsed ultrasound scanners. To overcome this limitation, parallel beamforming techniques are used in ultrasound 2-D and 3-D imaging systems. Different parallel beamforming approaches have been proposed. They may be grouped into two major categories: parallel beamforming in reception and parallel beamforming in transmission. The first category is not optimal for harmonic imaging; the second category may be more easily applied to harmonic imaging. However, inter-beam interference represents an issue. To overcome these shortcomings and exploit the benefit of combining harmonic imaging and high data acquisition rate, a new approach has been recently presented which relies on orthogonal frequency division multiplexing (OFDM) to perform parallel beamforming in transmission. In this paper, parallel transmit beamforming using OFDM is implemented for the first time on an ultrasound scanner. An advanced open platform for ultrasound research is used to investigate the axial resolution and interbeam interference achievable with parallel transmit beamforming using OFDM. Both fundamental and second-harmonic imaging modalities have been considered. Results show that, for fundamental imaging, axial resolution in the order of 2 mm can be achieved in combination with interbeam interference in the order of -30 dB. For second-harmonic imaging, axial resolution in the order of 1 mm can be achieved in combination with interbeam interference in the order of -35 dB.
Hosomi, Naohisa; Nagai, Yoji; Kohriyama, Tatsuo; Ohtsuki, Toshiho; Aoki, Shiro; Nezu, Tomohisa; Maruyama, Hirofumi; Sunami, Norio; Yokota, Chiaki; Kitagawa, Kazuo; Terayama, Yasuo; Takagi, Makoto; Ibayashi, Setsuro; Nakamura, Masakazu; Origasa, Hideki; Fukushima, Masanori; Mori, Etsuro; Minematsu, Kazuo; Uchiyama, Shinichiro; Shinohara, Yukito; Yamaguchi, Takenori; Matsumoto, Masayasu
2015-09-01
Although statin therapy is beneficial for the prevention of initial stroke, the benefit for recurrent stroke and its subtypes remains to be determined in Asian, in whom stroke profiles are different from Caucasian. This study examined whether treatment with low-dose pravastatin prevents stroke recurrence in ischemic stroke patients. This is a multicenter, randomized, open-label, blinded-endpoint, parallel-group study of patients who experienced non-cardioembolic ischemic stroke. All patients had a total cholesterol level between 4.65 and 6.21 mmol/L at enrollment, without the use of statins. The pravastatin group patients received 10 mg of pravastatin/day; the control group patients received no statins. The primary endpoint was the occurrence of stroke and transient ischemic attack (TIA), with the onset of each stroke subtype set to be one of the secondary endpoints. Although 3000 patients were targeted, 1578 patients (491 female, age 66.2 years) were recruited and randomly assigned to pravastatin group or control group. During the follow-up of 4.9 ± 1.4 years, although total stroke and TIA similarly occurred in both groups (2.56 vs. 2.65%/year), onset of atherothrombotic infarction was less frequent in pravastatin group (0.21 vs. 0.64%/year, p = 0.0047, adjusted hazard ratio 0.33 [95%CI 0.15 to 0.74]). No significant intergroup difference was found for the onset of other stroke subtypes, and for the occurrence of adverse events. Although whether low-dose pravastatin prevents recurrence of total stroke or TIA still needs to be examined in Asian, this study has generated a hypothesis that it may reduce occurrence of stroke due to larger artery atherosclerosis. This study was initially supported by a grant from the Ministry of Health, Labour and Welfare, Japan. After the governmental support expired, it was conducted in collaboration between Hiroshima University and the Foundation for Biomedical Research and Innovation.
Methodology Series Module 4: Clinical Trials.
Setia, Maninder Singh
2016-01-01
In a clinical trial, study participants are (usually) divided into two groups. One group is then given the intervention and the other group is not given the intervention (or may be given some existing standard of care). We compare the outcomes in these groups and assess the role of intervention. Some of the trial designs are (1) parallel study design, (2) cross-over design, (3) factorial design, and (4) withdrawal group design. The trials can also be classified according to the stage of the trial (Phase I, II, III, and IV) or the nature of the trial (efficacy vs. effectiveness trials, superiority vs. equivalence trials). Randomization is one of the procedures by which we allocate different interventions to the groups. It ensures that all the included participants have a specified probability of being allocated to either of the groups in the intervention study. If participants and the investigator know about the allocation of the intervention, then it is called an "open trial." However, many of the trials are not open - they are blinded. Blinding is useful to minimize bias in clinical trials. The researcher should familiarize themselves with the CONSORT statement and the appropriate Clinical Trials Registry of India.
Methodology Series Module 4: Clinical Trials
Setia, Maninder Singh
2016-01-01
In a clinical trial, study participants are (usually) divided into two groups. One group is then given the intervention and the other group is not given the intervention (or may be given some existing standard of care). We compare the outcomes in these groups and assess the role of intervention. Some of the trial designs are (1) parallel study design, (2) cross-over design, (3) factorial design, and (4) withdrawal group design. The trials can also be classified according to the stage of the trial (Phase I, II, III, and IV) or the nature of the trial (efficacy vs. effectiveness trials, superiority vs. equivalence trials). Randomization is one of the procedures by which we allocate different interventions to the groups. It ensures that all the included participants have a specified probability of being allocated to either of the groups in the intervention study. If participants and the investigator know about the allocation of the intervention, then it is called an “open trial.” However, many of the trials are not open – they are blinded. Blinding is useful to minimize bias in clinical trials. The researcher should familiarize themselves with the CONSORT statement and the appropriate Clinical Trials Registry of India. PMID:27512184
NASA Astrophysics Data System (ADS)
Ruby, Michael
In the last decades scanning probe microscopy and spectroscopy have become well-established tools in nanotechnology and surface science. This opened the market for many commercial manufacturers, each with different hardware and software standards. Besides the advantage of a wide variety of available hardware, the diversity may software-wise complicate the data exchange between scientists, and the data analysis for groups working with hardware developed by different manufacturers. Not only the file format differs between manufacturers, but also the data often requires further numerical treatment before publication. SpectraFox is an open-source and independent tool which manages, processes, and evaluates scanning probe spectroscopy and microscopy data. It aims at simplifying the documentation in parallel to measurement, and it provides solid evaluation tools for a large number of data.
Support of Multidimensional Parallelism in the OpenMP Programming Model
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele
2003-01-01
OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Gnessi, Lucio; Bacarea, Vladimir; Marusteri, Marius; Piqué, Núria
2015-10-30
There is a strong rationale for the use of agents with film-forming protective properties, like xyloglucan, for the treatment of acute diarrhea. However, few data from clinical trials are available. A randomized, controlled, open-label, parallel group, multicentre, clinical trial was performed to evaluate the efficacy and safety of xyloglucan, in comparison with diosmectite and Saccharomyces in adult patients with acute diarrhea due to different causes. Patients were randomized to receive a 3-day treatment. Symptoms (stools type, nausea, vomiting, abdominal pain and flatulence) were assessed by a self-administered ad-hoc questionnaire 1, 3, 6, 12, 24, 48 and 72 h following the first dose administration. Adverse events were also recorded. A total of 150 patients (69.3 % women and 30.7 % men, mean age 47.3 ± 14.7 years) were included (n = 50 in each group). A faster onset of action was observed in the xyloglucan group compared with the diosmectite and S. bouliardii groups. At 6 h xyloglucan produced a statistically significant higher decrease in the mean number of type 6 and 7 stools compared with diosmectite (p = 0.031). Xyloglucan was the most efficient treatment in reducing the percentage of patients with nausea throughout the study period, particularly during the first hours (from 26 % at baseline to 4 % after 6 and 12 h). An important improvement of vomiting was observed in all three treatment groups. Xyloglucan was more effective than diosmectite and S. bouliardii in reducing abdominal pain, with a constant improvement observed throughout the study. The clinical evolution of flatulence followed similar patterns in the three groups, with continuous improvement of the symptom. All treatments were well tolerated, without reported adverse events. Xyloglucan is a fast, efficacious and safe option for the treatment of acute diarrhea. EudraCT number 2014-001814-24 (date: 2014-04-28) ISRCTN number: 90311828.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems
NASA Astrophysics Data System (ADS)
Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.
2017-07-01
Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Katakami, Naoto; Mita, Tomoya; Yoshii, Hidenori; Shiraiwa, Toshihiko; Yasuda, Tetsuyuki; Okada, Yosuke; Umayahara, Yutaka; Kaneto, Hideaki; Osonoi, Takeshi; Yamamoto, Tsunehiko; Kuribayashi, Nobuichi; Maeda, Kazuhisa; Yokoyama, Hiroki; Kosugi, Keisuke; Ohtoshi, Kentaro; Hayashi, Isao; Sumitani, Satoru; Tsugawa, Mamiko; Ohashi, Makoto; Taki, Hideki; Nakamura, Tadashi; Kawashima, Satoshi; Sato, Yasunori; Watada, Hirotaka; Shimomura, Iichiro
2017-10-01
Sodium-glucose co-transporter-2 (SGLT2) inhibitors are anti-diabetic agents that improve glycemic control with a low risk of hypoglycemia and ameliorate a variety of cardiovascular risk factors. The aim of the ongoing study described herein is to investigate the preventive effects of tofogliflozin, a potent and selective SGLT2 inhibitor, on the progression of atherosclerosis in subjects with type 2 diabetes (T2DM) using carotid intima-media thickness (IMT), an established marker of cardiovascular disease (CVD), as a marker. The Study of Using Tofogliflozin for Possible better Intervention against Atherosclerosis for type 2 diabetes patients (UTOPIA) trial is a prospective, randomized, open-label, blinded-endpoint, multicenter, and parallel-group comparative study. The aim was to recruit a total of 340 subjects with T2DM but no history of apparent CVD at 24 clinical sites and randomly allocate these to a tofogliflozin treatment group or a conventional treatment group using drugs other than SGLT2 inhibitors. As primary outcomes, changes in mean and maximum IMT of the common carotid artery during a 104-week treatment period will be measured by carotid echography. Secondary outcomes include changes in glycemic control, parameters related to β-cell function and diabetic nephropathy, the occurrence of CVD and adverse events, and biochemical measurements reflecting vascular function. This is the first study to address the effects of SGLT2 inhibitors on the progression of carotid IMT in subjects with T2DM without a history of CVD. The results will be available in the very near future, and these findings are expected to provide clinical data that will be helpful in the prevention of diabetic atherosclerosis and subsequent CVD. Kowa Co., Ltd. UMIN000017607.
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Scalable Unix commands for parallel processors : a high-performance implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ong, E.; Lusk, E.; Gropp, W.
2001-06-22
We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2014-05-01
Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy
2017-10-06
The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.
Morales-Fernandez, Angeles; Morales-Asencio, Jose Miguel; Canca-Sanchez, Jose Carlos; Moreno-Martin, Gabriel; Vergara-Romero, Manuel
2016-05-01
To determine the effect of a nurse-led intervention programme for patients with chronic non-cancer pain. Chronic non-cancer pain is a widespread health problem and one that is insufficiently controlled. Nurses can play a vital role in pain management, using best practices in the assessment and management of pain under a holistic approach where the patient plays a proactive role in addressing the disease process. Improving the quality of life, reducing disability, achieving acceptance of health status, coping and breaking the vicious circle of pain should be the prime objectives of our care management programme. Open randomized parallel controlled study. The experimental group will undertake one single initial session, followed by six group sessions led by nurses, aimed at empowering patients for the self-management of pain. Healthy behaviours will be encouraged, such as sleep and postural hygiene, promotion of physical activity and healthy eating. Educational interventions on self-esteem, pain-awareness, communication and relaxing techniques will be carried out. As primary end points, quality of life, perceived level of pain, anxiety and depression will be evaluated. Secondary end points will be coping and satisfaction. Follow-up will be performed at 12 and 24 weeks. The study was approved by the Ethics and Research Committee Costa del Sol. If significant effects were detected, impact on quality of life through a nurse-led programme would offer a complementary service to existing pain clinics for a group of patients with frequent unmet needs. © 2016 John Wiley & Sons Ltd.
Characterizing Task-Based OpenMP Programs
Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats
2015-01-01
Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023
MacKenzie, K.R.
1958-09-01
An ion source is described for use in a calutron and more particularly deals with an improved filament arrangement for a calutron. According to the invention, the ion source block has a gas ionizing passage open along two adjoining sides of the block. A filament is disposed in overlying relation to one of the passage openings and has a greater width than the passage width, so that both the filament and opening lengths are parallel and extend in a transverse relation to the magnetic field. The other passage opening is parallel to the length of the magnetic field. This arrangement is effective in assisting in the production of a stable, long-lived arc for the general improvement of calutron operation.
Analysis of emotionality and locomotion in radio-frequency electromagnetic radiation exposed rats.
Narayanan, Sareesh Naduvil; Kumar, Raju Suresh; Paval, Jaijesh; Kedage, Vivekananda; Bhat, M Shankaranarayana; Nayak, Satheesha; Bhat, P Gopalakrishna
2013-07-01
In the current study the modulatory role of mobile phone radio-frequency electromagnetic radiation (RF-EMR) on emotionality and locomotion was evaluated in adolescent rats. Male albino Wistar rats (6-8 weeks old) were randomly assigned into the following groups having 12 animals in each group. Group I (Control): they remained in the home cage throughout the experimental period. Group II (Sham exposed): they were exposed to mobile phone in switch-off mode for 28 days, and Group III (RF-EMR exposed): they were exposed to RF-EMR (900 MHz) from an active GSM (Global system for mobile communications) mobile phone with a peak power density of 146.60 μW/cm(2) for 28 days. On 29th day, the animals were tested for emotionality and locomotion. Elevated plus maze (EPM) test revealed that, percentage of entries into the open arm, percentage of time spent on the open arm and distance travelled on the open arm were significantly reduced in the RF-EMR exposed rats. Rearing frequency and grooming frequency were also decreased in the RF-EMR exposed rats. Defecation boli count during the EPM test was more with the RF-EMR group. No statistically significant difference was found in total distance travelled, total arm entries, percentage of closed arm entries and parallelism index in the RF-EMR exposed rats compared to controls. Results indicate that mobile phone radiation could affect the emotionality of rats without affecting the general locomotion.
Parallel protein secondary structure prediction based on neural networks.
Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi
2004-01-01
Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Rubus: A compiler for seamless and extensible parallelism.
Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples.
Lin, Jake; Kramna, Lenka; Autio, Reija; Hyöty, Heikki; Nykter, Matti; Cinek, Ondrej
2017-05-15
Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols. We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table. The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped ('dark matter') reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source.
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Status and future plans for open source QuickPIC
NASA Astrophysics Data System (ADS)
An, Weiming; Decyk, Viktor; Mori, Warren
2017-10-01
QuickPIC is a three dimensional (3D) quasi-static particle-in-cell (PIC) code developed based on the UPIC framework. It can be used for efficiently modeling plasma based accelerator (PBA) problems. With quasi-static approximation, QuickPIC can use different time scales for calculating the beam (or laser) evolution and the plasma response, and a 3D plasma wake field can be simulated using a two-dimensional (2D) PIC code where the time variable is ξ = ct - z and z is the beam propagation direction. QuickPIC can be thousand times faster than the normal PIC code when simulating the PBA. It uses an MPI/OpenMP hybrid parallel algorithm, which can be run on either a laptop or the largest supercomputer. The open source QuickPIC is an object-oriented program with high level classes written in Fortran 2003. It can be found at https://github.com/UCLA-Plasma-Simulation-Group/QuickPIC-OpenSource.git
Yang, Wenying; Zhu, Lvyun; Meng, Bangzhu; Liu, Yu; Wang, Wenhui; Ye, Shandong; Sun, Li; Miao, Heng; Guo, Lian; Wang, Zhanjian; Lv, Xiaofeng; Li, Quanmin; Ji, Qiuhe; Zhao, Weigang; Yang, Gangyi
2016-01-01
The present study was to compare the efficacy and safety of subject-driven and investigator-driven titration of biphasic insulin aspart 30 (BIAsp 30) twice daily (BID). In this 20-week, randomized, open-label, two-group parallel, multicenter trial, Chinese patients with type 2 diabetes inadequately controlled by premixed/self-mixed human insulin were randomized 1:1 to subject-driven or investigator-driven titration of BIAsp 30 BID, in combination with metformin and/or α-glucosidase inhibitors. Dose adjustment was decided by patients in the subject-driven group after training, and by investigators in the investigator-driven group. Eligible adults (n = 344) were randomized in the study. The estimated glycated hemoglobin (HbA1c) reduction was 14.5 mmol/mol (1.33%) in the subject-driven group and 14.3 mmol/mol (1.31%) in the investigator-driven group. Non-inferiority of subject-titration vs investigator-titration in reducing HbA1c was confirmed, with estimated treatment difference -0.26 mmol/mol (95% confidence interval -2.05, 1.53) (-0.02%, 95% confidence interval -0.19, 0.14). Fasting plasma glucose, postprandial glucose increment and self-measured plasma glucose were improved in both groups without statistically significant differences. One severe hypoglycemic event was experienced by one subject in each group. A similar rate of nocturnal hypoglycemia (events/patient-year) was reported in the subject-driven (1.10) and investigator-driven (1.32) groups. There were 64.5 and 58.1% patients achieving HbA1c <53.0 mmol/mol (7.0%), and 51.2 and 45.9% patients achieving the HbA1c target without confirmed hypoglycemia throughout the trial in the subject-driven and investigator-driven groups, respectively. Subject-titration of BIAsp 30 BID was as efficacious and well-tolerated as investigator-titration. The present study supported patients to self-titrate BIAsp 30 BID under physicians' supervision.
Parallel inhomogeneity and the Alfven resonance. 1: Open field lines
NASA Technical Reports Server (NTRS)
Hansen, P. J.; Harrold, B. G.
1994-01-01
In light of a recent demonstration of the general nonexistence of a singularity at the Alfven resonance in cold, ideal, linearized magnetohydrodynamics, we examine the effect of a small density gradient parallel to uniform, open ambient magnetic field lines. To lowest order, energy deposition is quantitatively unaffected but occurs continuously over a thickened layer. This effect is illustrated in a numerical analysis of a plasma sheet boundary layer model with perfectly absorbing boundary conditions. Consequences of the results are discussed, both for the open field line approximation and for the ensuing closed field line analysis.
Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School
NASA Astrophysics Data System (ADS)
Lafage, Vincent
2017-11-01
Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel implementation of approximate atomistic models of the AMOEBA polarizable model
NASA Astrophysics Data System (ADS)
Demerdash, Omar; Head-Gordon, Teresa
2016-11-01
In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.
A Programming Model Performance Study Using the NAS Parallel Benchmarks
Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...
2010-01-01
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Cantarella, Daniele; Dominguez-Mompell, Ramon; Mallya, Sanjay M; Moschik, Christoph; Pan, Hsin Chuan; Miller, Joseph; Moon, Won
2017-11-01
Mini-implant-assisted rapid palatal expansion (MARPE) appliances have been developed with the aim to enhance the orthopedic effect induced by rapid maxillary expansion (RME). Maxillary Skeletal Expander (MSE) is a particular type of MARPE appliance characterized by the presence of four mini-implants positioned in the posterior part of the palate with bi-cortical engagement. The aim of the present study is to evaluate the MSE effects on the midpalatal and pterygopalatine sutures in late adolescents, using high-resolution CBCT. Specific aims are to define the magnitude and sagittal parallelism of midpalatal suture opening, to measure the extent of transverse asymmetry of split, and to illustrate the possibility of splitting the pterygopalatine suture. Fifteen subjects (mean age of 17.2 years; range, 13.9-26.2 years) were treated with MSE. Pre- and post-treatment CBCT exams were taken and superimposed. A novel methodology based on three new reference planes was utilized to analyze the sutural changes. Parameters were compared from pre- to post-treatment and between genders non-parametrically using the Wilcoxon sign rank test. For the frequency of openings in the lower part of the pterygopalatine suture, the Fisher's exact test was used. Regarding the magnitude of midpalatal suture opening, the split at anterior nasal spine (ANS) and at posterior nasal spine (PNS) was 4.8 and 4.3 mm, respectively. The amount of split at PNS was 90% of that at ANS, showing that the opening of the midpalatal suture was almost perfectly parallel antero-posteriorly. On average, one half of the anterior nasal spine (ANS) moved more than the contralateral one by 1.1 mm. Openings between the lateral and medial plates of the pterygoid process were detectable in 53% of the sutures (P < 0.05). No significant differences were found in the magnitude and frequency of suture opening between males and females. Correlation between age and suture opening was negligible (R 2 range, 0.3-4.2%). Midpalatal suture was successfully split by MSE in late adolescents, and the opening was almost perfectly parallel in a sagittal direction. Regarding the extent of transverse asymmetry of the split, on average one half of ANS moved more than the contralateral one by 1.1 mm. Pterygopalatine suture was split in its lower region by MSE, as the pyramidal process was pulled out from the pterygoid process. Patient gender and age had a negligible influence on suture opening for the age group considered in the study.
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, C; Quinlan, D J; Willcock, J J
2008-12-12
Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructuremore » which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.« less
Procacci, Piero
2016-06-27
We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
Teodorescu, C; Young, W C; Swan, G W S; Ellis, R F; Hassam, A B; Romero-Talamas, C A
2010-08-20
Interferometric density measurements in plasmas rotating in shaped, open magnetic fields demonstrate strong confinement of plasma parallel to the magnetic field, with density drops of more than a factor of 10. Taken together with spectroscopic measurements of supersonic E × B rotation of sonic Mach 2, these measurements are in agreement with ideal MHD theory which predicts large parallel pressure drops balanced by centrifugal forces in supersonically rotating plasmas.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Deuse, Tobias; Bara, Christoph; Barten, Markus J; Hirt, Stephan W; Doesch, Andreas O; Knosalla, Christoph; Grinninger, Carola; Stypmann, Jörg; Garbade, Jens; Wimmer, Peter; May, Christoph; Porstner, Martina; Schulz, Uwe
2015-11-01
In recent years a series of trials has sought to define the optimal protocol for everolimus-based immunosuppression in heart transplantation, with the goal of minimizing exposure to calcineurin inhibitors (CNIs) and harnessing the non-immunosuppressive benefits of everolimus. Randomized studies have demonstrated that immunosuppressive potency can be maintained in heart transplant patients receiving everolimus despite marked CNI reduction, although very early CNI withdrawal may be inadvisable. A potential renal advantage has been shown for everolimus, but the optimal time for conversion and the adequate reduction in CNI exposure remain to be defined. Other reasons for use of everolimus include a substantial reduction in the risk of cytomegalovirus infection, and evidence for inhibition of cardiac allograft vasculopathy, a major cause of graft loss. The ongoing MANDELA study is a 12-month multicenter, randomized, open-label, parallel-group study in which efficacy, renal function and safety are compared in approximately 200 heart transplant patients. Patients receive CNI therapy, steroids and everolimus or mycophenolic acid during months 3 to 6 post-transplant, and are then randomized at month 6 post-transplant (i) to convert to CNI-free immunosuppression with everolimus and mycophenolic acid or (ii) to continue reduced-exposure CNI, with concomitant everolimus. Patients are then followed to month 18 post-transplant The rationale and expectations for the trial and its methodology are described herein. Copyright © 2015 Elsevier Inc. All rights reserved.
Velmurugan, N; Sooriaprakas, C; Jain, Preetham
2014-01-01
Objective: Immature teeth have a large apical opening and thin divergent or parallel dentinal walls; hence, with conventional needle irrigation there is a very high possibility of extrusion. This study was done to compare the apical extrusion of NaOCl in an immature root delivered using EndoVac and needle irrigation. Materials and Methods: Eighty freshly extracted maxillary central incisors were decoronated followed by access cavity preparation. Modified organotypic protocol was performed to create an open apex; then, the samples were divided into four groups (n=20): EndoVac Microcannula (group I), EndoVac Macrocannula (group II), NaviTip irrigation needle (group III) and Max-i-Probe Irrigating needle (group IV); 9.0 ml of 3% sodium hypochlorite was delivered slowly over a period of 60 seconds. Extruded irrigants were collected in a vial and analysed statistically. Results: Group I, group III and group IV showed 100% extrusion (20/20) but group II showed only 40% extrusion (8/20). The difference in this respect between group II and other groups was statistically significant (P<0.001). With regards to the volume of extrusion, group II had only 0.23 ml of extruded irrigant. Group I extruded 7.53ml of the irrigant. Group III and group IV extruded the entire volume of irrigant delivered. Conclusion: EndoVac Macrocannula resulted in the least extrusion of irrigant in immature teeth when compared to EndoVac Microcannula and conventional needle irrigation. PMID:25584055
Simulation of partially coherent light propagation using parallel computing devices
NASA Astrophysics Data System (ADS)
Magalhães, Tiago C.; Rebordão, José M.
2017-08-01
Light acquires or loses coherence and coherence is one of the few optical observables. Spectra can be derived from coherence functions and understanding any interferometric experiment is also relying upon coherence functions. Beyond the two limiting cases (full coherence or incoherence) the coherence of light is always partial and it changes with propagation. We have implemented a code to compute the propagation of partially coherent light from the source plane to the observation plane using parallel computing devices (PCDs). In this paper, we restrict the propagation in free space only. To this end, we used the Open Computing Language (OpenCL) and the open-source toolkit PyOpenCL, which gives access to OpenCL parallel computation through Python. To test our code, we chose two coherence source models: an incoherent source and a Gaussian Schell-model source. In the former case, we divided into two different source shapes: circular and rectangular. The results were compared to the theoretical values. Our implemented code allows one to choose between the PyOpenCL implementation and a standard one, i.e using the CPU only. To test the computation time for each implementation (PyOpenCL and standard), we used several computer systems with different CPUs and GPUs. We used powers of two for the dimensions of the cross-spectral density matrix (e.g. 324, 644) and a significant speed increase is observed in the PyOpenCL implementation when compared to the standard one. This can be an important tool for studying new source models.
Optimising the Parallelisation of OpenFOAM Simulations
2014-06-01
UNCLASSIFIED UNCLASSIFIED Optimising the Parallelisation of OpenFOAM Simulations Shannon Keough Maritime Division Defence...Science and Technology Organisation DSTO-TR-2987 ABSTRACT The OpenFOAM computational fluid dynamics toolbox allows parallel computation of...performance of a given high performance computing cluster with several OpenFOAM cases, running using a combination of MPI libraries and corresponding MPI
Mantle flow through a tear in the Nazca slab inferred from shear wave splitting
NASA Astrophysics Data System (ADS)
Lynner, Colton; Anderson, Megan L.; Portner, Daniel E.; Beck, Susan L.; Gilbert, Hersh
2017-07-01
A tear in the subducting Nazca slab is located between the end of the Pampean flat slab and normally subducting oceanic lithosphere. Tomographic studies suggest mantle material flows through this opening. The best way to probe this hypothesis is through observations of seismic anisotropy, such as shear wave splitting. We examine patterns of shear wave splitting using data from two seismic deployments in Argentina that lay updip of the slab tear. We observe a simple pattern of plate-motion-parallel fast splitting directions, indicative of plate-motion-parallel mantle flow, beneath the majority of the stations. Our observed splitting contrasts previous observations to the north and south of the flat slab region. Since plate-motion-parallel splitting occurs only coincidentally with the slab tear, we propose mantle material flows through the opening resulting in Nazca plate-motion-parallel flow in both the subslab mantle and mantle wedge.
OpenACC performance for simulating 2D radial dambreak using FVM HLLE flux
NASA Astrophysics Data System (ADS)
Gunawan, P. H.; Pahlevi, M. R.
2018-03-01
The aim of this paper is to investigate the performances of openACC platform for computing 2D radial dambreak. Here, the shallow water equation will be used to describe and simulate 2D radial dambreak with finite volume method (FVM) using HLLE flux. OpenACC is a parallel computing platform based on GPU cores. Indeed, from this research this platform is used to minimize computational time on the numerical scheme performance. The results show the using OpenACC, the computational time is reduced. For the dry and wet radial dambreak simulations using 2048 grids, the computational time of parallel is obtained 575.984 s and 584.830 s respectively for both simulations. These results show the successful of OpenACC when they are compared with the serial time of dry and wet radial dambreak simulations which are collected 28047.500 s and 29269.40 s respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo Zehua; Tang Xianzhu
Parallel transport of long mean-free-path plasma along an open magnetic field line is characterized by strong temperature anisotropy, which is driven by two effects. The first is magnetic moment conservation in a non-uniform magnetic field, which can transfer energy between parallel and perpendicular degrees of freedom. The second is decompressional cooling of the parallel temperature due to parallel flow acceleration by conventional presheath electric field which is associated with the sheath condition near the wall surface where the open magnetic field line intercepts the discharge chamber. To the leading order in gyroradius to system gradient length scale expansion, the parallelmore » transport can be understood via the Chew-Goldbeger-Low (CGL) model which retains two components of the parallel heat flux, i.e., q{sub n} associated with the parallel thermal energy and q{sub s} related to perpendicular thermal energy. It is shown that in addition to the effect of magnetic field strength (B) modulation, the two components (q{sub n} and q{sub s}) of the parallel heat flux play decisive roles in the parallel variation of the plasma profile, which includes the plasma density (n), parallel flow (u), parallel and perpendicular temperatures (T{sub Parallel-To} and T{sub Up-Tack }), and the ambipolar potential ({phi}). Both their profile (q{sub n}/B and q{sub s}/B{sup 2}) and the upstream values of the ratio of the conductive and convective thermal flux (q{sub n}/nuT{sub Parallel-To} and q{sub s}/nuT{sub Up-Tack }) provide the controlling physics, in addition to B modulation. The physics described by the CGL model are contrasted with those of the double-adiabatic laws and further elucidated by comparison with the first-principles kinetic simulation for a specific but representative flux expander case.« less
Neoclassical transport fluxes inside transport barriers in tokamaks
NASA Astrophysics Data System (ADS)
Shaing, K. C.
2011-10-01
Inside the transport barriers in tokamaks ion energy losses sometimes are smaller than the value predicted by the standard neoclassical theory. This improvement can be understood in terms of the orbit squeezing theory in addition to the sonic poloidal E × B Mach number Up . m that pushes the tips of the trapped particles to the higher energy. In general, Up . m also includes the poloidal component of the parallel mass flow speed. These physics mechanisms are the corner stones for the transition theory of the low confinement mode (L-mode) to the high confinement mode (H-mode) in tokamaks. Here, detailed transport fluxes in the banana regime are presented using the parallel viscous forces calculated earlier. It is found, as expected, that effects of orbit squeezing and the sonic Up . m reduce the ion heat conductivity. The former reduces it by a factor of | S | 3/2 and the later by a factor of R
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Laser Safety Method For Duplex Open Loop Parallel Optical Link
Baumgartner, Steven John; Hedin, Daniel Scott; Paschal, Matthew James
2003-12-02
A method and apparatus are provided to ensure that laser optical power does not exceed a "safe" level in an open loop parallel optical link in the event that a fiber optic ribbon cable is broken or otherwise severed. A duplex parallel optical link includes a transmitter and receiver pair and a fiber optic ribbon that includes a designated number of channels that cannot be split. The duplex transceiver includes a corresponding transmitter and receiver that are physically attached to each other and cannot be detached therefrom, so as to ensure safe, laser optical power in the event that the fiber optic ribbon cable is broken or severed. Safe optical power is ensured by redundant current and voltage safety checks.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...
2016-09-21
In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Ojeda-May, Pedro; Nam, Kwangho
2017-08-08
The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.
2014-07-01
We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.
Gregarious Data Re-structuring in a Many Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
NASA Astrophysics Data System (ADS)
Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi
2017-07-01
We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
Yazici Yilmaz, Fatma; Aydogan Mathyk, Begum; Yildiz, Serhat; Yenigul, Nefise Nazli; Saglam, Ceren
2018-03-21
The purpose of this study was to compare postoperative pain and neuropathy after primary caesarean sections with either blunt or sharp fascial expansions. A total of 123 women undergoing primary caesarean sections were included in the study. The sharp group had 61 patients, and the blunt group had 62. In the sharp group, the fascia was incised sharply and extended using scissors. In blunt group, the fascia was bluntly opened by lateral finger-pulling. The primary outcome was postoperative pain. The long-term chronic pain scores were significantly lower in the blunt group during mobilisation (p = .012 and p = .022). Neuropathy was significantly more prevalent in the sharp group at both 1 and 3 months postoperatively (p = .043 and p = .016, respectively). The odds ratio (OR) and 95%CI for postoperative neuropathy at 1 and 3 months were as follows; OR 3.71, 95%CI 0.97-14.24 and OR 5.67, 95%CI 1.18-27.08, respectively. The OR for postoperative pain after 3 months was 3.26 (95%CI 1.09-9.73). The prevelance of postsurgical neuropathy and chronic pain at 3 months were significantly lower in the blunt group. Blunt fascial opening reduces the complication rate of postoperative pain and neuropathy after caesarean sections. Impact statement What is already known on this subject? The anatomic relationship of the abdominal fascia and the anterior abdominal wall nerves is a known fact. The fascia during caesarean sections can be opened by either a sharp or blunt extension. Data on the isolated impact of different fascial incisions on postoperative pain is limited. What do the results of this study add? The postoperative pain scores on the incision area are lower in the bluntly opened group compared to the sharp fascial incision group. By extending the fascia bluntly, a decrease in trauma and damage to nerves was observed. What are the implications of these findings for clinical practice and/or future research? The lateral extension of the fascia during caesarean sections must be done cautiously to prevent temporary damage to nerves and vessels. The blunt opening of the fascia by lateral finger pulling might be a preferred method over the sharp approach that uses scissors. We included only primary caesarean cases, however, comparisons of blunt and sharp fascial incisions in patients with more than one abdominal surgery should be explored in future studies.
Nyström, Thomas; Padro Santos, Irene; Hedberg, Fredric; Wardell, Johan; Witt, Nils; Cao, Yang; Bojö, Leif; Nilsson, Bo; Jendle, Johan
2017-01-01
We aimed to investigate the effect of liraglutide treatment on heart function in type 2 diabetes (T2D) patients with subclinical heart failure. Randomized open parallel-group trial. 62 T2D patients (45 male) with subclinical heart failure were randomized to either once daily liraglutide 1.8 mg, or glimepiride 4 mg, both add on to metformin 1 g twice a day. Mitral annular systolic (s') and early diastolic (e') velocities were measured at rest and during bicycle ergometer exercise, using tissue Doppler echocardiography. The primary endpoint was 18-week treatment changes in longitudinal functional reserve index (LFRI diastolic/systolic ). Clinical characteristics between groups (liraglutide = 33 vs. glimepiride = 29) were well matched. At baseline left ventricle ejection fraction (53.7 vs. 53.6%) and global longitudinal strain (-15.3 vs. -16.5%) did not differ between groups. There were no significant differences in mitral flow velocities between groups. For the primary endpoint, there was no treatment change [95% confidence interval] for: LFRI diastolic (-0.18 vs. -0.53 [-0.28, 2.59; p = 0.19]), or LFRI systolic (-0.10 vs. -0.18 [-1.0, 1.7; p = 0.54]); for the secondary endpoints, there was a significant treatment change in respect of body weight (-3.7 vs. -0.2 kg [-5.5, -1.4; p = 0.001]), waist circumference (-3.1 vs. -0.8 cm [-4.2, -0.4; p = 0.019]), and heart rate (HR) (6.3 vs. -2.3 bpm [-3.0, 14.2; p = 0.003]), with no such treatment change in hemoglobin A1c levels (-11.0 vs. -9.2 mmol/mol [-7.0, 2.6; p = 0.37]), between groups. 18-week treatment of liraglutide compared with glimepiride did not improve LFRI diastolic/systolic , but however increased HR. There was a significant treatment change in body weight reduction in favor for liraglutide treatment.
OPAL: An Open-Source MPI-IO Library over Cray XT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Weikuan; Vetter, Jeffrey S; Canon, Richard Shane
Parallel IO over Cray XT is supported by a vendor-supplied MPI-IO package. This package contains a proprietary ADIO implementation built on top of the sysio library. While it is reasonable to maintain a stable code base for application scientists' convenience, it is also very important to the system developers and researchers to analyze and assess the effectiveness of parallel IO software, and accordingly, tune and optimize the MPI-IO implementation. A proprietary parallel IO code base relinquishes such flexibilities. On the other hand, a generic UFS-based MPI-IO implementation is typically used on many Linux-based platforms. We have developed an open-source MPI-IOmore » package over Lustre, referred to as OPAL (OPportunistic and Adaptive MPI-IO Library over Lustre). OPAL provides a single source-code base for MPI-IO over Lustre on Cray XT and Linux platforms. Compared to Cray implementation, OPAL provides a number of good features, including arbitrary specification of striping patterns and Lustre-stripe aligned file domain partitioning. This paper presents the performance comparisons between OPAL and Cray's proprietary implementation. Our evaluation demonstrates that OPAL achieves the performance comparable to the Cray implementation. We also exemplify the benefits of an open source package in revealing the underpinning of the parallel IO performance.« less
Ferrando, Carlos; Suarez-Sipmann, Fernando; Tusman, Gerardo; León, Irene; Romero, Esther; Gracia, Estefania; Mugarra, Ana; Arocas, Blanca; Pozo, Natividad; Soro, Marina; Belda, Francisco J
2017-01-01
Low tidal volume (VT) during anesthesia minimizes lung injury but may be associated to a decrease in functional lung volume impairing lung mechanics and efficiency. Lung recruitment (RM) can restore lung volume but this may critically depend on the post-RM selected PEEP. This study was a randomized, two parallel arm, open study whose primary outcome was to compare the effects on driving pressure of adding a RM to low-VT ventilation, with or without an individualized post-RM PEEP in patients without known previous lung disease during anesthesia. Consecutive patients scheduled for major abdominal surgery were submitted to low-VT ventilation (6 ml·kg-1) and standard PEEP of 5 cmH2O (pre-RM, n = 36). After 30 min estabilization all patients received a RM and were randomly allocated to either continue with the same PEEP (RM-5 group, n = 18) or to an individualized open-lung PEEP (OL-PEEP) (Open Lung Approach, OLA group, n = 18) defined as the level resulting in maximal Cdyn during a decremental PEEP trial. We compared the effects on driving pressure and lung efficiency measured by volumetric capnography. OL-PEEP was found at 8±2 cmH2O. 36 patients were included in the final analysis. When compared with pre-RM, OLA resulted in a 22% increase in compliance and a 28% decrease in driving pressure when compared to pre-RM. These parameters did not improve in the RM-5. The trend of the DP was significantly different between the OLA and RM-5 groups (p = 0.002). VDalv/VTalv was significantly lower in the OLA group after the RM (p = 0.035). Lung recruitment applied during low-VT ventilation improves driving pressure and lung efficiency only when applied as an open-lung strategy with an individualized PEEP in patients without lung diseases undergoing major abdominal surgery. ClinicalTrials.gov NCT02798133.
Tycho 2: A Proxy Application for Kinetic Transport Sweeps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, Charles Kristopher; Warsa, James S.
2016-09-14
Tycho 2 is a proxy application that implements discrete ordinates (SN) kinetic transport sweeps on unstructured, 3D, tetrahedral meshes. It has been designed to be small and require minimal dependencies to make collaboration and experimentation as easy as possible. Tycho 2 has been released as open source software. The software is currently in a beta release with plans for a stable release (version 1.0) before the end of the year. The code is parallelized via MPI across spatial cells and OpenMP across angles. Currently, several parallelization algorithms are implemented.
JETSPIN: A specific-purpose open-source software for simulations of nanofiber electrospinning
NASA Astrophysics Data System (ADS)
Lauricella, Marco; Pontrelli, Giuseppe; Coluzza, Ivan; Pisignano, Dario; Succi, Sauro
2015-12-01
We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.
NASA Astrophysics Data System (ADS)
Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.
2017-11-01
Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Heron, Stuart R; Woby, Steve R; Thompson, Dave P
2017-06-01
To assess the efficacy of three different exercise programmes in treating rotator cuff tendinopathy/shoulder impingement syndrome. Parallel group randomised clinical trial. Two out-patient NHS physiotherapy departments in Manchester, United Kingdom. 120 patients with shoulder pain of at least three months duration. Pain was reproduced on stressing the rotator cuff and participants had full passive range of movement at the shoulder. Three dynamic rotator cuff loading programmes; open chain resisted band exercises (OC) closed chain exercises (CC) and minimally loaded range of movement exercises (ROM). Change in Shoulder Pain and Disability Index (SPADI) score and the proportion of patients making a Minimally Clinically Important Change (MCIC) in symptoms 6 weeks after commencing treatment. All three programmes resulted in significant decreases in SPADI score, however there were no significant differences between the groups. Participants making a MCIC in symptoms were similar across all groups, however more participants deteriorated in the ROM group. Dropout rate was higher in the CC group, but when only patients completing treatment were considered more patients in the CC group made a meaningful reduction in pain and disability. Open chain, closed chain and range of movement exercises all seem to be effective in bringing about short term changes in pain and disability in patients with rotator cuff tendinopathy. ISRCTN76701121. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
Butow, Phyllis; Beeney, Linda; Juraskova, Ilona; Ussher, Jane; Zordan, Rachel
2009-01-01
Rewards derived from leading a cancer support group are poorly understood yet may be crucial to offset the challenges and difficulties of this role. This study sought to obtain the views of a representative sample of Australian cancer support group leaders (CSGLs) concerning the perceived rewards and challenges of their role. All CSGLs identified by the state-based Cancer Councils were invited to participate by postal questionnaire. Qualitative methods were used to analyze responses to open-ended questions concerning rewards and challenges. A total of 300 CSGLs returned the questionnaire (response rate = 66%) with 272 providing qualitative comments. Four parallel themes emerged from the qualitative analysis: (i) Personal, (ii) Relationship, (iii) Group, and (iv) Community rewards and challenges. These were integrated into a model depicting key positive and negative aspects of the CSGL's role, to provide direction for future training and ongoing support of CSGLs.
Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj
2012-07-03
We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Kim, Bong Hyun; Kim, Kyuseok; Nam, Hae Jeong
2017-01-31
Many previous studies of electroacupuncture used combined therapy of electroacupuncture and systemic manual acupuncture, so it was uncertain which treatment was effective. This study evaluated and compared the effects of systemic manual acupuncture, periauricular electroacupuncture and distal electroacupuncture for treating patients with tinnitus. A randomized, parallel, open-labeled exploratory trial was conducted. Subjects aged 20-75 years who had suffered from idiopathic tinnitus for > 2 weeks were recruited from May 2013 to April 2014. The subjects were divided into three groups by systemic manual acupuncture group (MA), periauricular electroacupuncture group (PE), and distal electroacupuncture group (DE). The groups were selected by random drawing. Nine acupoints (TE 17, TE21, SI19, GB2, GB8, ST36, ST37, TE3 and TE9), two periauricular acupoints (TE17 and TE21), and four distal acupoints (TE3, TE9, ST36, and ST37) were selected. The treatment sessions were performed twice weekly for a total of eight sessions over 4 weeks. Outcomes were the tinnitus handicap inventory (THI) score and the loud and uncomfortable visual analogue scales (VAS). Demographic and clinical characteristics of all participants were compared between the groups upon admission using one-way analysis of variance (ANOVA). One-way ANOVA was used to evaluate the THI, VAS loud , and VAS uncomfortable scores. The least significant difference test was used as a post-hoc test. Thirty-nine subjects were eligible and their data were analyzed. No difference in THI and VAS loudness scores was observed in between groups. The VAS uncomfortable scores decreased significantly in MA and DE compared with those in PE. Within the group, all three treatments showed some effect on THI, VAS loudness scores and VAS uncomfortable scores after treatment except DE in THI. There was no statistically significant difference between systemic manual acupuncture, periauricular electroacupuncture and distal electroacupuncture in tinnitus. However, all three treatments had some effect on tinnitus within the group before and after treatment. Systemic manual acupuncture and distal electroacupuncture have some effect on VAS uncomfortable . KCT0001991 by CRIS (Clinical Research Information Service), 2016-8-1, retrospectively registered.
Targeting multiple heterogeneous hardware platforms with OpenCL
NASA Astrophysics Data System (ADS)
Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.
2014-06-01
The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
Chandramohan, S M; Gajbhiye, Raj Narenda; Agwarwal, Anil; Creedon, Erin; Schwiers, Michael L; Waggoner, Jason R; Tatla, Daljit
2013-08-01
Although stapling is an alternative to hand-suturing in gastrointestinal surgery, recent trials specifically designed to evaluate differences between the two in surgery time, anastomosis time, and return to bowel activity are lacking. This trial compared the outcomes of the two in subjects undergoing open gastrointestinal surgery. Adult subjects undergoing emergency or elective surgery requiring a single gastric, small, or large bowel anastomosis were enrolled into this open-label, prospective, randomized, interventional, parallel, multicenter, controlled trial. Randomization was assigned in a 1:1 ratio between the hand-sutured group (n = 138) and the stapled group (n = 142). Anastomosis time, surgery time, and time to bowel activity were collected and compared as primary endpoints. A total of 280 subjects were enrolled from April 2009 to September 2010. Only the time of anastomosis was significantly different between the two arms: 17.6 ± 1.90 min (stapled) and 20.6 ± 1.90 min (hand-sutured). This difference was deemed not clinically or economically meaningful. Safety outcomes and other secondary endpoints were similar between the two arms. Mechanical stapling is faster than hand-suturing for the construction of gastrointestinal anastomoses. Apart from this, stapling and hand-suturing are similar with respect to the outcomes measured in this trial.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
2014-07-22
The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.
Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro
2016-01-01
Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Utilizing GPUs to Accelerate Turbomachinery CFD Codes
NASA Technical Reports Server (NTRS)
MacCalla, Weylin; Kulkarni, Sameer
2016-01-01
GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
Ranganath, Lakshminarayan R; Milan, Anna M; Hughes, Andrew T; Dutton, John J; Fitzgerald, Richard; Briggs, Michael C; Bygott, Helen; Psarelli, Eftychia E; Cox, Trevor F; Gallagher, James A; Jarvis, Jonathan C; van Kan, Christa; Hall, Anthony K; Laan, Dinny; Olsson, Birgitta; Szamosi, Johan; Rudebeck, Mattias; Kullenberg, Torbjörn; Cronlund, Arvid; Svensson, Lennart; Junestrand, Carin; Ayoob, Hana; Timmis, Oliver G; Sireau, Nicolas; Le Quan Sang, Kim-Hanh; Genovese, Federica; Braconi, Daniela; Santucci, Annalisa; Nemethova, Martina; Zatkova, Andrea; McCaffrey, Judith; Christensen, Peter; Ross, Gordon; Imrich, Richard; Rovensky, Jozef
2016-02-01
Alkaptonuria (AKU) is a serious genetic disease characterised by premature spondyloarthropathy. Homogentisate-lowering therapy is being investigated for AKU. Nitisinone decreases homogentisic acid (HGA) in AKU but the dose-response relationship has not been previously studied. Suitability Of Nitisinone In Alkaptonuria 1 (SONIA 1) was an international, multicentre, randomised, open-label, no-treatment controlled, parallel-group, dose-response study. The primary objective was to investigate the effect of different doses of nitisinone once daily on 24-h urinary HGA excretion (u-HGA24) in patients with AKU after 4 weeks of treatment. Forty patients were randomised into five groups of eight patients each, with groups receiving no treatment or 1 mg, 2 mg, 4 mg and 8 mg of nitisinone. A clear dose-response relationship was observed between nitisinone and the urinary excretion of HGA. At 4 weeks, the adjusted geometric mean u-HGA24 was 31.53 mmol, 3.26 mmol, 1.44 mmol, 0.57 mmol and 0.15 mmol for the no treatment or 1 mg, 2 mg, 4 mg and 8 mg doses, respectively. For the most efficacious dose, 8 mg daily, this corresponds to a mean reduction of u-HGA24 of 98.8% compared with baseline. An increase in tyrosine levels was seen at all doses but the dose-response relationship was less clear than the effect on HGA. Despite tyrosinaemia, there were no safety concerns and no serious adverse events were reported over the 4 weeks of nitisinone therapy. In this study in patients with AKU, nitisinone therapy decreased urinary HGA excretion to low levels in a dose-dependent manner and was well tolerated within the studied dose range. EudraCT number: 2012-005340-24. Registered at ClinicalTrials.gov: NCTO1828463. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Allinea Parallel Profiling and Debugging Tools on the Peregrine System |
client for your platform. (Mac/Windows/Linux) Configuration to connect to Peregrine: Open the Allinea view it # directly through x11 forwarding just type 'map', # it will open a GUI. $ map # to profile an enable x-forwarding when connecting to # Peregrine. $ map # This will open the GUI Debugging using
NASA Technical Reports Server (NTRS)
Nagano, S.
1979-01-01
Base driver with common-load-current feedback protects paralleled inverter systems from open or short circuits. Circuit eliminates total system oscillation that can occur in conventional inverters because of open circuit in primary transformer winding. Common feedback signal produced by functioning modules forces operating frequency of failed module to coincide with clock drive so module resumes normal operating frequency in spite of open circuit.
Papakostas, George I; Fava, Maurizio; Baer, Lee; Swee, Michaela B; Jaeger, Adrienne; Bobo, William V; Shelton, Richard C
2015-12-01
The authors sought to test the efficacy of adjunctive ziprasidone in adults with nonpsychotic unipolar major depression experiencing persistent symptoms after 8 weeks of open-label treatment with escitalopram. This was an 8-week, randomized, double-blind, parallel-group, placebo-controlled trial conducted at three academic medical centers. Participants were 139 outpatients with persistent symptoms of major depression after an 8-week open-label trial of escitalopram (phase 1), randomly assigned in a 1:1 ratio to receive adjunctive ziprasidone (escitalopram plus ziprasidone, N=71) or adjunctive placebo (escitalopram plus placebo, N=68), with 8 weekly follow-up assessments. The primary outcome measure was clinical response, defined as a reduction of at least 50% in score on the 17-item Hamilton Depression Rating Scale (HAM-D). The Hamilton Anxiety Rating scale (HAM-A) and Visual Analog Scale for Pain were defined a priori as key secondary outcome measures. Rates of clinical response (35.2% compared with 20.5%) and mean improvement in HAM-D total scores (-6.4 [SD=6.4] compared with -3.3 [SD=6.2]) were significantly greater for the escitalopram plus ziprasidone group. Several secondary measures of antidepressant efficacy also favored adjunctive ziprasidone. The escitalopram plus ziprasidone group also showed significantly greater improvement on HAM-A score but not on Visual Analog Scale for Pain score. Ten (14%) patients in the escitalopram plus ziprasidone group discontinued treatment because of intolerance, compared with none in the escitalopram plus placebo group. Ziprasidone as an adjunct to escitalopram demonstrated antidepressant efficacy in adult patients with major depressive disorder experiencing persistent symptoms after 8 weeks of open-label treatment with escitalopram.
Begovac, Branka; Begovac, Ivan
2012-09-01
This article presents, in the form of a clinical illustration, a therapeutic group of bereaved mothers with special reference to their dreams about their deceased children. The article presents descriptions of the emotions of these mothers and countertransference feelings, a topic that, to our knowledge, has not been frequently studied. The group was small, analytically oriented, slow-open, comprised of women bereaved by the death of a child, and conducted by a female therapist. Over more than three years, the group included 20 members in total. This article describes a number of dreams recorded during a period when the group included seven members. Dreams helped the group members access their emotional pain, helplessness, yearning for a relationship with the deceased, guilt, and feelings of survival guilt. The transference-countertransference relationships were characterized by holding. Countertransference feelings of helplessness predominated. The therapist and the group as a whole contained various emotions, allowing the group members to return to the normal mourning processes from the parallel encouragement of group development and interpersonal relationships.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chrisochoides, N.; Sukup, F.
In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Open-Source Development of the Petascale Reactive Flow and Transport Code PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Andre, B.; Bisht, G.; Johnson, T.; Karra, S.; Lichtner, P. C.; Mills, R. T.
2013-12-01
Open-source software development has become increasingly popular in recent years. Open-source encourages collaborative and transparent software development and promotes unlimited free redistribution of source code to the public. Open-source development is good for science as it reveals implementation details that are critical to scientific reproducibility, but generally excluded from journal publications. In addition, research funds that would have been spent on licensing fees can be redirected to code development that benefits more scientists. In 2006, the developers of PFLOTRAN open-sourced their code under the U.S. Department of Energy SciDAC-II program. Since that time, the code has gained popularity among code developers and users from around the world seeking to employ PFLOTRAN to simulate thermal, hydraulic, mechanical and biogeochemical processes in the Earth's surface/subsurface environment. PFLOTRAN is a massively-parallel subsurface reactive multiphase flow and transport simulator designed from the ground up to run efficiently on computing platforms ranging from the laptop to leadership-class supercomputers, all from a single code base. The code employs domain decomposition for parallelism and is founded upon the well-established and open-source parallel PETSc and HDF5 frameworks. PFLOTRAN leverages modern Fortran (i.e. Fortran 2003-2008) in its extensible object-oriented design. The use of this progressive, yet domain-friendly programming language has greatly facilitated collaboration in the code's software development. Over the past year, PFLOTRAN's top-level data structures were refactored as Fortran classes (i.e. extendible derived types) to improve the flexibility of the code, ease the addition of new process models, and enable coupling to external simulators. For instance, PFLOTRAN has been coupled to the parallel electrical resistivity tomography code E4D to enable hydrogeophysical inversion while the same code base can be used as a third-party library to provide hydrologic flow, energy transport, and biogeochemical capability to the community land model, CLM, part of the open-source community earth system model (CESM) for climate. In this presentation, the advantages and disadvantages of open source software development in support of geoscience research at government laboratories, universities, and the private sector are discussed. Since the code is open-source (i.e. it's transparent and readily available to competitors), the PFLOTRAN team's development strategy within a competitive research environment is presented. Finally, the developers discuss their approach to object-oriented programming and the leveraging of modern Fortran in support of collaborative geoscience research as the Fortran standard evolves among compiler vendors.
Emission of sound from turbulence convected by a parallel flow in the presence of solid boundaries
NASA Technical Reports Server (NTRS)
Goldstein, M. E.; Rosenbaum, B. M.
1973-01-01
A theoretical description is given of the sound emitted from an arbitrary point in a parallel or nearly parallel turbulent shear flow confined to a region near solid boundaries. The analysis begins with Lighthill's formulation of aerodynamic noise and assumes that the turbulence is axisymmetric. Specific results are obtained for the sound emitted from an arbitrary point in a turbulent flow within a semi-infinite, open-ended duct.
Zhou, Lili; Clifford Chao, K S; Chang, Jenghwa
2012-11-01
Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation. The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ∕ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray∕white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL∕GPGPU-based implementation with the CPU-based implementation. The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL∕GPGPU-based implementations was on the order of 10(-6) and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively. The authors have successfully built digital phantoms using anatomic CT images and NIST μ∕ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU∕OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.
Methods of parallel computation applied on granular simulations
NASA Astrophysics Data System (ADS)
Martins, Gustavo H. B.; Atman, Allbens P. F.
2017-06-01
Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Parallel and Serial Grouping of Image Elements in Visual Perception
ERIC Educational Resources Information Center
Houtkamp, Roos; Roelfsema, Pieter R.
2010-01-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some…
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX
NASA Astrophysics Data System (ADS)
Omidian, Hossein; Lemieux, Guy G. F.
2018-04-01
Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
NASA Astrophysics Data System (ADS)
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
Improvisation and Meditation in the Academy: Parallel Ordeals, Insights, and Openings
ERIC Educational Resources Information Center
Sarath, Edward
2015-01-01
This article examines parallel challenges and avenues for progress I have observed in my efforts to introduce improvisation in classical music studies, and meditation in music and overall academic settings. Though both processes were once central in their respective knowledge traditions--improvisation in earlier eras of European classical music,…
NASA Astrophysics Data System (ADS)
Hou, Zhenlong; Huang, Danian
2017-09-01
In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.
Charpentier, Guillaume; Benhamou, Pierre-Yves; Dardari, Dured; Clergeot, Annie; Franc, Sylvia; Schaepelynck-Belicar, Pauline; Catargi, Bogdan; Melki, Vincent; Chaillous, Lucy; Farret, Anne; Bosson, Jean-Luc; Penfornis, Alfred
2011-03-01
To demonstrate that Diabeo software enabling individualized insulin dose adjustments combined with telemedicine support significantly improves HbA(1c) in poorly controlled type 1 diabetic patients. In a six-month open-label parallel-group, multicenter study, adult patients (n = 180) with type 1 diabetes (>1 year), on a basal-bolus insulin regimen (>6 months), with HbA(1c) ≥ 8%, were randomized to usual quarterly follow-up (G1), home use of a smartphone recommending insulin doses with quarterly visits (G2), or use of the smartphone with short teleconsultations every 2 weeks but no visit until point end (G3). Six-month mean HbA(1c) in G3 (8.41 ± 1.04%) was lower than in G1 (9.10 ± 1.16%; P = 0.0019). G2 displayed intermediate results (8.63 ± 1.07%). The Diabeo system gave a 0.91% (0.60; 1.21) improvement in HbA(1c) over controls and a 0.67% (0.35; 0.99) reduction when used without teleconsultation. There was no difference in the frequency of hypoglycemic episodes or in medical time spent for hospital or telephone consultations. However, patients in G1 and G2 spent nearly 5 h more than G3 patients attending hospital visits. The Diabeo system gives a substantial improvement to metabolic control in chronic, poorly controlled type 1 diabetic patients without requiring more medical time and at a lower overall cost for the patient than usual care.
Yotebieng, Marcel; Behets, Frieda; Kawende, Bienvenu; Ravelomanana, Noro Lantoniaina Rosa; Tabala, Martine; Okitolonda, Emile W
2017-04-26
Despite the rapid adoption of the World Health Organization's 2013 guidelines, children continue to be infected with HIV perinatally because of sub-optimal adherence to the continuum of HIV care in maternal and child health (MCH) clinics. To achieve the UNAIDS goal of eliminating mother-to-child HIV transmission, multiple, adaptive interventions need to be implemented to improve adherence to the HIV continuum. The aim of this open label, parallel, group randomized trial is to evaluate the effectiveness of Continuous Quality Improvement (CQI) interventions implemented at facility and health district levels to improve retention in care and virological suppression through 24 months postpartum among pregnant and breastfeeding women receiving ART in MCH clinics in Kinshasa, Democratic Republic of Congo. Prior to randomization, the current monitoring and evaluation system will be strengthened to enable collection of high quality individual patient-level data necessary for timely indicators production and program outcomes monitoring to inform CQI interventions. Following randomization, in health districts randomized to CQI, quality improvement (QI) teams will be established at the district level and at MCH clinics level. For 18 months, QI teams will be brought together quarterly to identify key bottlenecks in the care delivery system using data from the monitoring system, develop an action plan to address those bottlenecks, and implement the action plan at the level of their district or clinics. If proven to be effective, CQI as designed here, could be scaled up rapidly in resource-scarce settings to accelerate progress towards the goal of an AIDS free generation. The protocol was retrospectively registered on February 7, 2017. ClinicalTrials.gov Identifier: NCT03048669 .
Thunström, Erik; Manhem, Karin; Rosengren, Annika; Peker, Yüksel
2016-02-01
Obstructive sleep apnea (OSA) is common in people with hypertension, particularly resistant hypertension. Treatment with an antihypertensive agent alone is often insufficient to control hypertension in patients with OSA. To determine whether continuous positive airway pressure (CPAP) added to treatment with an antihypertensive agent has an impact on blood pressure (BP) levels. During the initial 6-week, two-center, open, prospective, case-control, parallel-design study (2:1; OSA/no-OSA), all patients began treatment with an angiotensin II receptor antagonist, losartan, 50 mg daily. In the second 6-week, sex-stratified, open, randomized, parallel-design study of the OSA group, all subjects continued to receive losartan and were randomly assigned to either nightly CPAP as add-on therapy or no CPAP. Twenty-four-hour BP monitoring included assessment every 15 minutes during daytime hours and every 20 minutes during the night. Ninety-one patients with untreated hypertension underwent a home sleep study (55 were found to have OSA; 36 were not). Losartan significantly reduced systolic, diastolic, and mean arterial BP in both groups (without OSA: 12.6, 7.2, and 9.0 mm Hg; with OSA: 9.8, 5.7, and 6.1 mm Hg). Add-on CPAP treatment had no significant changes in 24-hour BP values but did reduce nighttime systolic BP by 4.7 mm Hg. All 24-hour BP values were reduced significantly in the 13 patients with OSA who used CPAP at least 4 hours per night. Losartan reduced BP in OSA, but the reductions were less than in no-OSA. Add-on CPAP therapy resulted in no significant changes in 24-hour BP measures except in patients using CPAP efficiently. Clinical trial registered with www.clinicaltrials.gov (NCT00701428).
Rossignol, Patrick; Dorval, Marc; Fay, Renaud; Ros, Joan Fort; Loughraieb, Nathalie; Moureau, Frédérique; Laville, Maurice
2013-06-01
Anticoagulation for chronic dialysis patients with contraindications to heparin administration is challenging. Current guidelines state that in patients with increased bleeding risks, strategies that can induce systemic anticoagulation should be avoided. Heparin-free dialysis using intermittent saline flushes is widely adopted as the method of choice for patients at risk of bleeding, although on-line blood predilution may also be used. A new dialyzer, Evodial (Gambro, Lund, Sweden), is grafted with unfractionated heparin during the manufacturing process and may allow safe and efficient heparin-free hemodialysis sessions. In the present trial, Evodial was compared to standard care with either saline flushes or blood predilution. The HepZero study is the first international (seven countries), multicenter (10 centers), randomized, controlled, open-label, non-inferiority (and if applicable subsequently, superiority) trial with two parallel groups, comprising 252 end-stage renal disease patients treated by maintenance hemodialysis for at least 3 months and requiring heparin-free dialysis treatments. Patients will be treated during a maximum of three heparin-free dialysis treatments with either saline flushes or blood predilution (control group), or Evodial. The first heparin-free dialysis treatment will be considered successful when there is: no complete occlusion of air traps or dialyzer rendering dialysis impossible; no additional saline flushes to prevent clotting; no change of dialyzer or blood lines because of clotting; and no premature termination (early rinse-back) because of clotting.The primary objectives of the study are to determine the effectiveness of the Evodial dialyzer, compared with standard care in terms of successful treatments during the first heparin-free dialysis. If the non-inferiority of Evodial is demonstrated then the superiority of Evodial over standard care will be tested. The HepZero study results may have major clinical implications for patient care. ClinicalTrials.gov NCT01318486.
Whole body vibration for older persons: an open randomized, multicentre, parallel, clinical trial
2011-01-01
Background Institutionalized older persons have a poor functional capacity. Including physical exercise in their routine activities decreases their frailty and improves their quality of life. Whole-body vibration (WBV) training is a type of exercise that seems beneficial in frail older persons to improve their functional mobility, but the evidence is inconclusive. This trial will compare the results of exercise with WBV and exercise without WBV in improving body balance, muscle performance and fall prevention in institutionalized older persons. Methods/Design An open, multicentre and parallel randomized clinical trial with blinded assessment. 160 nursing home residents aged over 65 years and of both sexes will be identified to participate in the study. Participants will be centrally randomised and allocated to interventions (vibration or exercise group) by telephone. The vibration group will perform static/dynamic exercises (balance and resistance training) on a vibratory platform (Frequency: 30-35 Hz; Amplitude: 2-4 mm) over a six-week training period (3 sessions/week). The exercise group will perform the same exercise protocol but without a vibration stimuli platform. The primary outcome measure is the static/dynamic body balance. Secondary outcomes are muscle strength and, number of new falls. Follow-up measurements will be collected at 6 weeks and at 6 months after randomization. Efficacy will be analysed on an intention-to-treat (ITT) basis and 'per protocol'. The effects of the intervention will be evaluated using the "t" test, Mann-Witney test, or Chi-square test, depending on the type of outcome. The final analysis will be performed 6 weeks and 6 months after randomization. Discussion This study will help to clarify whether WBV training improves body balance, gait mobility and muscle strength in frail older persons living in nursing homes. As far as we know, this will be the first study to evaluate the efficacy of WBV for the prevention of falls. Trial Registration ClinicalTrials.gov: NCT01375790 PMID:22192313
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
NASA Astrophysics Data System (ADS)
Lermusiaux, Laurent; Bidault, Sebastien
2016-03-01
The nanometer-scale sensitivity of plasmon coupling allows the translation of minute morphological changes in nanostructures into macroscopic optical signals. In particular, single nanostructure scattering spectroscopy provides a direct estimation of interparticle distances in gold nanoparticle (AuNP) dimers linked by a short DNA double-strand [M. P. Busson et al, Nano Lett. 11, 5060 (2011)]. We demonstrate here that this spectroscopic information can be inferred from simple widefield measurements on a calibrated color camera [L. Lermusiaux et al, ACS Nano 9, 978 (2015)]. This allows us to analyze the influence of electrostatic and steric interparticle interactions on the morphology of DNA-templated AuNP groupings. Furthermore, polarization-resolved measurements on a color CCD provide a parallel imaging of AuNP dimer orientations. We apply this spectroscopic characterization to identify dimers featuring two different conformations of the same DNA template. In practice, the biomolecular scaffold contains a hairpin-loop that opens after hybridization to a specific DNA sequence and increases the interparticle distance [L. Lermusiaux et al, ACS Nano 6, 10992 (2012)]. These results open exciting perspectives for the parallel sensing of single specific DNA strands using plasmon rulers. We discuss the limits of this approach in terms of the physicochemical stability and reactivity of these nanostructures and demonstrate the importance of engineering the AuNP surface chemistry, in particular using amphiphilic ligands [L. Lermusiaux and S. Bidault, Small (2015), in press].
Effects of Wii balance board exercises on balance after posterior cruciate ligament reconstruction.
Puh, Urška; Majcen, Nia; Hlebš, Sonja; Rugelj, Darja
2014-05-01
To establish the effects of training on Wii balance board (WBB) after posterior cruciate ligament (PCL) reconstruction on balance. Included patient injured her posterior cruciate ligament 22 months prior to the study. Training on WBB was performed 4 weeks, 6 times per week, 30-45 min per day. Center of pressure (CoP) sway during parallel and one-leg stance, and body weight distribution in parallel stance were measured. Additionally, measurements of joint range of motion and limb circumferences were taken before and after training. After training, the body weight was almost equally distributed on both legs. Decrease in CoP sway was most significant for one-leg stance with each leg on compliant surface with eyes open and closed. The knee joint range of motion increased and limb circumferences decreased. According to the results of this single case report, we might recommend the use of WBB for balance training after PCL reconstruction. Case series with no comparison group, Level IV.
SWMM5 Application Programming Interface and PySWMM: A Python Interfacing Wrapper
In support of the OpenWaterAnalytics open source initiative, the PySWMM project encompasses the development of a Python interfacing wrapper to SWMM5 with parallel ongoing development of the USEPA Stormwater Management Model (SWMM5) application programming interface (API). ...
Datacube Services in Action, Using Open Source and Open Standards
NASA Astrophysics Data System (ADS)
Baumann, P.; Misev, D.
2016-12-01
Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.
QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials.
Giannozzi, Paolo; Baroni, Stefano; Bonini, Nicola; Calandra, Matteo; Car, Roberto; Cavazzoni, Carlo; Ceresoli, Davide; Chiarotti, Guido L; Cococcioni, Matteo; Dabo, Ismaila; Dal Corso, Andrea; de Gironcoli, Stefano; Fabris, Stefano; Fratesi, Guido; Gebauer, Ralph; Gerstmann, Uwe; Gougoussis, Christos; Kokalj, Anton; Lazzeri, Michele; Martin-Samos, Layla; Marzari, Nicola; Mauri, Francesco; Mazzarello, Riccardo; Paolini, Stefano; Pasquarello, Alfredo; Paulatto, Lorenzo; Sbraccia, Carlo; Scandolo, Sandro; Sclauzero, Gabriele; Seitsonen, Ari P; Smogunov, Alexander; Umari, Paolo; Wentzcovitch, Renata M
2009-09-30
QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). The acronym ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly-restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and interoperable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
NASA Astrophysics Data System (ADS)
Sandalski, Stou
Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named
OpenMP performance for benchmark 2D shallow water equations using LBM
NASA Astrophysics Data System (ADS)
Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry
2018-03-01
Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
A method for data handling numerical results in parallel OpenFOAM simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anton, Alin; Muntean, Sebastian
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Leveraging human oversight and intervention in large-scale parallel processing of open-source data
NASA Astrophysics Data System (ADS)
Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.
2015-05-01
The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Mulenga, Veronica; Musiime, Victor; Kekitiinwa, Adeodata; Cook, Adrian D; Abongomera, George; Kenny, Julia; Chabala, Chisala; Mirembe, Grace; Asiimwe, Alice; Owen-Powell, Ellen; Burger, David; McIlleron, Helen; Klein, Nigel; Chintu, Chifumbe; Thomason, Margaret J; Kityo, Cissy; Walker, A Sarah; Gibb, Diana M
2016-02-01
WHO 2013 guidelines recommend universal treatment for HIV-infected children younger than 5 years. No paediatric trials have compared nucleoside reverse-transcriptase inhibitors (NRTIs) in first-line antiretroviral therapy (ART) in Africa, where most HIV-infected children live. We aimed to compare stavudine, zidovudine, or abacavir as dual or triple fixed-dose-combination paediatric tablets with lamivudine and nevirapine or efavirenz. In this open-label, parallel-group, randomised trial (CHAPAS-3), we enrolled children from one centre in Zambia and three in Uganda who were previously untreated (ART naive) or on stavudine for more than 2 years with viral load less than 50 copies per mL (ART experienced). Computer-generated randomisation tables were incorporated securely within the database. The primary endpoint was grade 2-4 clinical or grade 3/4 laboratory adverse events. Analysis was intention to treat. This trial is registered with the ISRCTN Registry number, 69078957. Between Nov 8, 2010, and Dec 28, 2011, 480 children were randomised: 156 to stavudine, 159 to zidovudine, and 165 to abacavir. After two were excluded due to randomisation error, 156 children were analysed in the stavudine group, 158 in the zidovudine group, and 164 in the abacavir group, and followed for median 2·3 years (5% lost to follow-up). 365 (76%) were ART naive (median age 2·6 years vs 6·2 years in ART experienced). 917 grade 2-4 clinical or grade 3/4 laboratory adverse events (835 clinical [634 grade 2]; 40 laboratory) occurred in 104 (67%) children on stavudine, 103 (65%) on zidovudine, and 105 (64%), on abacavir (p=0·63; zidovudine vs stavudine: hazard ratio [HR] 0·99 [95% CI 0·75-1·29]; abacavir vs stavudine: HR 0·88 [0·67-1·15]). At 48 weeks, 98 (85%), 81 (80%) and 95 (81%) ART-naive children in the stavudine, zidovudine, and abacavir groups, respectively, had viral load less than 400 copies per mL (p=0·58); most ART-experienced children maintained suppression (p=1·00). All NRTIs had low toxicity and good clinical, immunological, and virological responses. Clinical and subclinical lipodystrophy was not noted in those younger than 5 years and anaemia was no more frequent with zidovudine than with the other drugs. Absence of hypersensitivity reactions, superior resistance profile and once-daily dosing favours abacavir for African children, supporting WHO 2013 guidelines. European Developing Countries Clinical Trials Partnership. Copyright © 2016 Walker et al. Open Access article distributed under the terms of CC BY. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
Placebo Effects and the Common Cold: A Randomized Controlled Trial
Barrett, Bruce; Brown, Roger; Rakel, Dave; Rabago, David; Marchand, Lucille; Scheder, Jo; Mundt, Marlon; Thomas, Gay; Barlow, Shari
2011-01-01
PURPOSE We wanted to determine whether the severity and duration of illness caused by the common cold are influenced by randomized assignment to open-label pills, compared with conventional double-blind allocation to active and placebo pills, compared with no pills at all. METHODS We undertook a randomized controlled trial among a population with new-onset common cold. Study participants were allocated to 4 parallel groups: (1) those receiving no pills, (2) those blinded to placebo, (3) those blinded to echinacea, and (4) those given open-label echinacea. Primary outcomes were illness duration and area-under-the-curve global severity. Secondary outcomes included neutrophil count and interleukin 8 levels from nasal wash at intake and 2 days later. RESULTS Of 719 randomized study participants, 2 were lost and 4 exited early. Participants were 64% female, 88% white, and aged 12 to 80 years. Mean illness duration for each group was 7.03 days for those in the no-pill group, 6.87 days for those blinded to placebo, 6.34 days for those blinded to echinacea, and 6.76 days for those in the open-label echinacea group. Mean global severity scores for the 4 groups were no pills, 286; blinded to placebo, 264; blinded to echinacea, 236; and open-label echinacea, 258. Between-group differences were not statistically significant. Comparing the no-pill with blinded to placebo groups, differences (95% confidence interval [CI]) were −0.16 days (95% CI, −0.90 to 0.58 days) for illness duration and −22 severity points (95% CI, −70 to 26 points) for global severity. Comparing the group blinded to echinacea with the open-label echinacea group, differences were 0.42 days (95% CI, −0.28 to 1.12 days) and 22 severity points (95% CI, −19 to 63 points). Median change in interleukin 8 concentration and neutrophil cell count, respectively by group, were 30 pg/mL and 1 cell for the no-pill group, 39 pg/mL and 1 cell for the group binded to placebo, 58 pg/mL and 2 cells for the group blinded to echinacea, and 70 pg/mL and 1 cell for the group with open-label echinacea, also not statistically significant. Among the 120 participants who at intake rated echinacea’s effectiveness as greater than 50 on a 100-point scale for which 100 is extremely effective, illness duration was 2.58 days shorter (95% CI, −4.47 to −0.68 days) in those blinded to placebo rather than no pill, and mean global severity score was 26% lower but not significantly different (−97.0, 95% CI, −249.8 to 55.8 points). In this subgroup, neither duration nor severity differed significantly between the group blinded to echinacea and the open-label echinacea group. CONCLUSIONS Participants randomized to the no-pill group tended to have longer and more severe illnesses than those who received pills. For the subgroup who believed in echinacea and received pills, illnesses were substantively shorter and less severe, regardless of whether the pills contained echinacea. These findings support the general idea that beliefs and feelings about treatments may be important and perhaps should be taken into consideration when making medical decisions. PMID:21747102
Lodén, Marie; Wirén, Karin; Smerud, Knut; Meland, Nils; Hønnås, Helge; Mørk, Gro; Lützow-Holm, Claus; Funk, Jörgen; Meding, Birgitta
2010-11-01
Hand eczema influences the quality of life. Management strategies include the use of moisturizers. In the present study the time to relapse of eczema during treatment with a barrier-strengthening moisturizer (5% urea) was compared with no treatment (no medical or non-medicated preparations) in 53 randomized patients with successfully treated hand eczema. The median time to relapse was 20 days in the moisturizer group compared with 2 days in the no treatment group (p = 0.04). Eczema relapsed in 90% of the patients within 26 weeks. No difference in severity was noted between the groups at relapse. Dermatology Life Quality Index (DLQI) increased significantly in both groups; from 4.7 to 7.1 in the moisturizer group and from 4.1 to 7.8 in the no treatment group (p < 0.01) at the time of relapse. Hence, the application of moisturizers seems to prolong the disease-free interval in patients with controlled hand eczema. Whether the data is applic-able to moisturizers without barrier-strengthening properties remains to be elucidated.
Honey, James G.
1990-01-01
Two new species of washakiin omomyids occur in deposits of early Bridgerian age. Shoshonius bowni, sp. nov., from the Aycross Formation, Absaroka Range, Wyoming, differs from S. cooperi in having enlarged conules on the upper molars and a second metaconule, features convergent with Washakius insignis. Washakius izetti, sp. nov., from the Green River Formation, Piceance Creek Basin, Colorado, is the most primitive known species of Washakius, showing incipient development of features present in the later W. insignis and W. woodringi. Washakius, cf. W. izetti occurs in the early Bridgerian of the Huerfano Basin. W. izetti is closely related to Utahia kayi, a washakiin possibly related to Stockia. Hemiacodon, sometimes included in the Washakiini, is probably more closely related to the Omomyini. Stockia is distinct from Omomys and is questionably included in the Washakiini, of which Loveina is the stem taxon. More advanced washakiins form two groups between which there was significant parallel evolution in dental morphology. One group includes Washakius, Dyseolemur, Utahia, and possibly Stockia, and is characterized by development of an open talonid notch before the consistent appearance of metastylids. The other group consists of Shoshonius, where the establish- ment of metastylids preceded the full opening of the
"They who dream by day": parallels between Openness to Experience and dreaming.
DeYoung, Colin G; Grazioplene, Rachael G
2013-12-01
Individuals high in the personality trait Openness to Experience appear to engage spontaneously (during wake) in processes of elaborative encoding similar to those Llewellyn identifies in both dreaming and the ancient art of memory (AAOM). Links between Openness and dreaming support the hypothesis that dreaming is part of a larger process of cognitive exploration that facilitates adaptation to new experiences.
NASA Astrophysics Data System (ADS)
Dong, Dai; Li, Xiaoning
2015-03-01
High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
Amstutz, Alain; Nsakala, Bienvenu Lengo; Vanobberghen, Fiona; Muhairwe, Josephine; Glass, Tracy Renée; Achieng, Beatrice; Sepeka, Mamorena; Tlali, Katleho; Sao, Lebohang; Thin, Kyaw; Klimkait, Thomas; Battegay, Manuel; Labhardt, Niklaus Daniel
2018-02-12
The World Health Organization (WHO) recommends viral load (VL) measurement as the preferred monitoring strategy for HIV-infected individuals on antiretroviral therapy (ART) in resource-limited settings. The new WHO guidelines 2016 continue to define virologic failure as two consecutive VL ≥1000 copies/mL (at least 3 months apart) despite good adherence, triggering switch to second-line therapy. However, the threshold of 1000 copies/mL for defining virologic failure is based on low-quality evidence. Observational studies have shown that individuals with low-level viremia (measurable but below 1000 copies/mL) are at increased risk for accumulation of resistance mutations and subsequent virologic failure. The SESOTHO trial assesses a lower threshold for switch to second-line ART in patients with sustained unsuppressed VL. In this multicenter, parallel-group, open-label, randomized controlled trial conducted in Lesotho, patients on first-line ART with two consecutive unsuppressed VL measurements ≥100 copies/mL, where the second VL is between 100 and 999 copies/mL, will either be switched to second-line ART immediately (intervention group) or not be switched (standard of care, according to WHO guidelines). The primary endpoint is viral resuppression (VL < 50 copies/mL) 9 months after randomization. We will enrol 80 patients, giving us 90% power to detect a difference of 35% in viral resuppression between the groups (assuming two-sided 5% alpha error). For our primary analysis, we will use a modified intention-to-treat set, with those lost to care, death, or crossed over considered failure to resuppress, and using logistic regression models adjusted for the prespecified stratification variables. The SESOTHO trial challenges the current WHO guidelines, assessing an alternative, lower VL threshold for patients with unsuppressed VL on first-line ART. This trial will provide data to inform future WHO guidelines on VL thresholds to recommend switch to second-line ART. ClinicalTrials.gov ( NCT03088241 ), registered May 05, 2017.
Exploiting Symmetry on Parallel Architectures.
NASA Astrophysics Data System (ADS)
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Parallel and serial grouping of image elements in visual perception.
Houtkamp, Roos; Roelfsema, Pieter R
2010-12-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some situations, but we demonstrate that there are also situations where Gestalt grouping becomes serial. We observe substantial time delays when image elements have to be grouped indirectly through a chain of local groupings. We call this chaining process incremental grouping and demonstrate that it can occur for only a single object at a time. We suggest that incremental grouping requires the gradual spread of object-based attention so that eventually all the object's parts become grouped explicitly by an attentional labeling process. Our findings inspire a new incremental grouping theory that relates the parallel, local grouping process to feedforward processing and the serial, incremental grouping process to recurrent processing in the visual cortex.
APPARATUS FOR PRODUCING IONS OF VAPORIZABLE MATERIALS
Starr, C.
1957-11-19
This patent relates to electronic discharge devices used as ion sources, and in particular describes an ion source for application in a calutron. The source utilizes two cathodes disposed at opposite ends of a longitudinal opening in an arc block fed with vaporized material. A magnetic field is provided parallel to the length of the arc block opening. The electrons from the cathodes are directed through slits in collimating electrodes into the arc block parallel to the magnetic field and cause an arc discharge to occur between the cathodes, as the arc block and collimating electrodes are at a positive potential with respect to the cathode. The ions are withdrawn by suitable electrodes disposed opposite the arc block opening. When such an ion source is used in a calutron, an arc discharge of increased length may be utilized, thereby increasing the efficiency and economy of operation.
De Ridder, D J M K; Everaert, K; Fernández, L García; Valero, J V Forner; Durán, A Borau; Abrisqueta, M L Jauregui; Ventura, M G; Sotillo, A Rodriguez
2005-12-01
To compare the performance of SpeediCath hydrophilic-coated catheters versus uncoated polyvinyl chloride (PVC) catheters, in traumatic spinal cord injured patients presenting with functional neurogenic bladder-sphincter disorders. A 1-year, prospective, open, parallel, comparative, randomised, multi centre study included 123 male patients, > or =16 y and injured within the last 6 months. Primary endpoints were occurrence of symptomatic urinary tract infection (UTI) and hematuria. Secondary endpoints were development of urethral strictures and convenience of use. The main hypothesis was that coated catheters cause fewer complications in terms of symptomatic UTIs and hematuria. 57 out of 123 patients completed the 12-month study. Fewer patients using the SpeediCath hydrophilic-coated catheter (64%) experienced 1 or more UTIs compared to the uncoated PVC catheter group (82%) (p = 0.02). Thus, twice as many patients in the SpeediCath group were free of UTI. There was no significant difference in the number of patients experiencing bleeding episodes (38/55 SpeediCath; 32/59 PVC) and no overall difference in the occurrence of hematuria, leukocyturia and bacteriuria. The results indicate that there is a beneficial effect regarding UTI when using hydrophilic-coated catheters.
Automatic Thread-Level Parallelization in the Chombo AMR Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christen, Matthias; Keen, Noel; Ligocki, Terry
2011-05-26
The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
Di Tommaso, Paolo; Orobitg, Miquel; Guirado, Fernando; Cores, Fernado; Espinosa, Toni; Notredame, Cedric
2010-08-01
We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment. T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html
de Vreede, Gert-Jan; Briggs, Robert O; Reiter-Palmon, Roni
2010-04-01
The aim of this study was to compare the results of two different modes of using multiple groups (instead of one large group) to identify problems and develop solutions. Many of the complex problems facing organizations today require the use of very large groups or collaborations of groups from multiple organizations. There are many logistical problems associated with the use of such large groups, including the ability to bring everyone together at the same time and location. A field study involved two different organizations and compared productivity and satisfaction of group. The approaches included (a) multiple small groups, each completing the entire process from start to end and combining the results at the end (parallel mode); and (b) multiple subgroups, each building on the work provided by previous subgroups (serial mode). Groups using the serial mode produced more elaborations compared with parallel groups, whereas parallel groups produced more unique ideas compared with serial groups. No significant differences were found related to satisfaction with process and outcomes between the two modes. Preferred mode depends on the type of task facing the group. Parallel groups are more suited for tasks for which a variety of new ideas are needed, whereas serial groups are best suited when elaboration and in-depth thinking on the solution are required. Results of this research can guide the development of facilitated sessions of large groups or "teams of teams."
Use Computer-Aided Tools to Parallelize Large CFD Applications
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Yan, J.
2000-01-01
Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
Cost and effectiveness of lung lobectomy by video-assisted thoracic surgery for lung cancer
Mafé, Juan J.; Planelles, Beatriz; Asensio, Santos; Cerezal, Jorge; Inda, María-del-Mar; Lacueva, Javier; Esteban, Maria-Dolores; Hernández, Luis; Martín, Concepción; Baschwitz, Benno
2017-01-01
Background Video-assisted thoracic surgery (VATS) emerged as a minimally invasive surgery for diseases in the field of thoracic surgery. We herein reviewed our experience on thoracoscopic lobectomy for early lung cancer and evaluated Health System use. Methods A cost-effectiveness study was performed comparing VATS vs. open thoracic surgery (OPEN) for lung cancer patients. Demographic data, tumor localization, dynamic pulmonary function tests [forced vital capacity (FVC), forced expiratory volume in one second (FEV1), diffusion capacity (DLCO) and maximal oxygen uptake (VO2max)], surgical approach, postoperative details, and complications were recorded and analyzed. Results One hundred seventeen patients underwent lung resection by VATS (n=42, 36%; age: 63±9 years old, 57% males) or OPEN (n=75, 64%; age: 61±11 years old, 73% males). Pulmonary function tests decreased just after surgery with a parallel increasing tendency during first 12 months. VATS group tended to recover FEV1 and FVC quicker with significantly less clinical and post-surgical complications (31% vs. 53%, P=0.015). Costs including surgery and associated hospital stay, complications and costs in the 12 months after surgery were significantly lower for VATS (P<0.05). Conclusions The VATS approach surgery allowed earlier recovery at a lower cost than OPEN with a better cost-effectiveness profile. PMID:28932560
Parallel, Distributed Scripting with Python
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, P J
2002-05-24
Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
Maeda-Yamamoto, Mari; Ema, Kaori; Monobe, Manami; Shibuichi, Ikuo; Shinoda, Yuki; Yamamoto, Tomohiro; Fujisawa, Takao
2009-09-01
We previously reported that 'benifuuki' green tea containing O-methylated catechin significantly relieved the symptoms of perennial or seasonal rhinitis compared with a placebo green tea that did not contain O-methylated catechin in randomized double-blind clinical trials. In this study we assessed the effects of 'benifuuki' green tea on clinical symptoms of seasonal allergic rhinitis. An open-label, single-dose, randomized, parallel-group study was performed on 38 subjects with Japanese cedar pollinosis. The subjects were randomly assigned to long-term (December 27, 2006-April 8, 2007, 1.5 months before pollen exposure) or short-term (February 15, 2007: after cedar pollen dispersal--April 8, 2007) drinking of a 'benifuuki' tea drink containing 34 mg O-methylated catechin per day. Each subject recorded their daily symptom scores in a diary. The primary efficacy variable was the mean weekly nasal symptom medication score during the study period. The nasal symptom medication score in the long-term intake group was significantly lower than that of the short-term intake group at the peak of pollen dispersal. The symptom scores for throat pain, nose-blowing, tears, and hindrance to activities of daily living were significantly better in the long-term group than the short-term group. In particular, the differences in the symptom scores for throat pain and nose-blowing between the 2 groups were marked. We conclude that drinking 'benifuuki' tea for 1.5 months prior to the cedar pollen season is effective in reducing symptom scores for Japanese cedar pollinosis.
Parallel transformation of K-SVD solar image denoising algorithm
NASA Astrophysics Data System (ADS)
Liang, Youwen; Tian, Yu; Li, Mei
2017-02-01
The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Event parallelism: Distributed memory parallel computing for high energy physics experiments
NASA Astrophysics Data System (ADS)
Nash, Thomas
1989-12-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831
NASA Astrophysics Data System (ADS)
Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.
2014-12-01
The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton
2014-11-11
We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.
Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein
2017-01-01
Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
Flat connections in open string mirror symmetry
NASA Astrophysics Data System (ADS)
Alim, Murad; Hecht, Michael; Jockers, Hans; Mayr, Peter; Mertens, Adrian; Soroush, Masoud
2012-06-01
We study a flat connection defined on the open-closed deformation space of open string mirror symmetry for type II compactifications on Calabi-Yau threefolds with D-branes. We use flatness and integrability conditions to define distinguished flat coordinates and the superpotential function at an arbitrary point in the open-closed deformation space. Integrability conditions are given for concrete deformation spaces with several closed and open string deformations. We study explicit examples for expansions around different limit points, including orbifold Gromov-Witten invariants, and brane configurations with several brane moduli. In particular, the latter case covers stacks of parallel branes with non-Abelian symmetry.
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
1980-10-31
and is initiated at the periphery of the de- vice at opening in the SijNj layer. Rate measurement* of thi* prove** made on the GKOUSS imager using...dimensions, single-mode opera- tion can be obtained. There is a stripe opening in the oxide film running parallel to the etched rib, which can be...seen in cross section in Fig. I-l(a). This stripe opening is the nucleation region for the epitaxial growth. Other oxide-confined waveguide
Ueda, Tamenobu; Kai, Hisashi; Imaizumi, Tsutomu
2012-07-01
The treatment of morning hypertension has not been established. We compared the efficacy and safety of a losartan/hydrochlorothiazide (HCTZ) combination and high-dose losartan in patients with morning hypertension. A prospective, randomized, open-labeled, parallel-group, multicenter trial enrolled 216 treated outpatients with morning hypertension evaluated by home blood pressure (BP) self-measurement. Patients were randomly assigned to receive a combination therapy of 50 mg losartan and 12.5 mg HCTZ (n=109) or a high-dose therapy with 100 mg losartan (n=107), each of which were administered once every morning. Primary efficacy end points were morning systolic BP (SBP) level and target BP achievement rate after 3 months of treatment. At baseline, BP levels were similar between the two therapy groups. Morning SBP was reduced from 150.3±10.1 to 131.5±11.5 mm Hg by combination therapy (P<0.001) and from 151.0±9.3 to 142.5±13.6 mm Hg by high-dose therapy (P<0.001). The morning SBP reduction was greater in the combination therapy group than in the high-dose therapy group (P<0.001). Combination therapy decreased evening SBP from 141.6±13.3 to 125.3±13.1 mm Hg (P<0.001), and high-dose therapy decreased evening SBP from 138.9±9.9 to 131.4±13.2 mm Hg (P<0.01). Although both therapies improved target BP achievement rates in the morning and evening (P<0.001 for both), combination therapy increased the achievement rates more than high-dose therapy (P<0.001 and P<0.05, respectively). In clinic measurements, combination therapy was superior to high-dose therapy in reducing SBP and improving the achievement rate (P<0.001 and P<0.01, respectively). Combination therapy decreased urine albumin excretion (P<0.05) whereas high-dose therapy reduced serum uric acid. Both therapies indicated strong adherence and few adverse effects (P<0.001). In conclusion, losartan/HCTZ combination therapy was more effective for controlling morning hypertension and reducing urine albumin than high-dose losartan.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Kircik, Leon H
2009-07-01
This 12-week, single-center, investigator-blinded, randomized, parallel-design study assessed the safety and efficacy of tretinoin microsphere gel 0.04% delivered by pump (TMG PUMP) to tazarotene cream 0.05% (TAZ) in mild-to-moderate facial acne vulgaris. Efficacy measurements included investigator global assessment (IGA), lesion counts, and subject self-assessment of acne signs and symptoms. Efficacy was generally comparable between treatment groups, although TMG PUMP provided more rapid results in several parameters. IGA showed a more rapid mean change from baseline at week 4 in the TMG PUMP group (-0.18 versus -0.05 in the TAZ subjects). TMG PUMP yielded more rapid improvement in papules. At week 4, the mean percentage change from baseline in open comedones was statistically significant at -64% in the TMG PUMP group (P=0.0039, within group) versus -19% in the TAZ group (not statistically significant within the group; P=0.1875). Skin dryness, peeling and pruritus were significantly less in the TMG PUMP group as early as week 4. Adverse events related to study treatment were rare in both groups and all resolved upon discontinuation of study medication.
Zhang, S-X; Huang, F; Gates, M; Shen, X; Holmberg, E G
2016-11-01
This is a randomized controlled prospective trial with two parallel groups. The objective of this study was to determine whether early application of tail nerve electrical stimulation (TANES)-induced walking training can improve the locomotor function. This study was conducted in SCS Research Center in Colorado, USA. A contusion injury to spinal cord T10 was produced using the New York University impactor device with a 25 -mm height setting in female, adult Long-Evans rats. Injured rats were randomly divided into two groups (n=12 per group). One group was subjected to TANES-induced walking training 2 weeks post injury, and the other group, as control, received no TANES-induced walking training. Restorations of behavior and conduction were assessed using the Basso, Beattie and Bresnahan open-field rating scale, horizontal ladder rung walking test and electrophysiological test (Hoffmann reflex). Early application of TANES-induced walking training significantly improved the recovery of locomotor function and benefited the restoration of Hoffmann reflex. TANES-induced walking training is a useful method to promote locomotor recovery in rats with spinal cord injury.
Single-dose pharmacokinetics of repaglinide in subjects with chronic liver disease.
Hatorp, V; Walther, K H; Christensen, M S; Haug-Pihale, G
2000-02-01
Repaglinide is a novel insulin secretagogue developed in response to the need for a fast-acting, oral prandial glucose regulator for the treatment of type 2 (non-insulin-dependent) diabetes mellitus. Repaglinide is metabolized mainly in the liver; its pharmacokinetics may therefore be altered by hepatic dysfunction. This open, parallel-group study compared the pharmacokinetics and tolerability of a single 4 mg dose of repaglinide in healthy subjects (n = 12) and patients with chronic liver disease (CLD) (n = 12). Values for AUC and Cmax were significantly higher in CLD patients compared with healthy subjects, and the MRT was prolonged in CLD patients. Values for tmax did not differ between the groups, but t1/2 was significantly prolonged in CLD patients compared with previously determined values in healthy subjects. AUC was inversely correlated with caffeine clearance in CLD patients but not in healthy subjects. Blood glucose profiles were similar in both groups. Adverse events (principally hypoglycemia) were similar in the two groups; none was serious. Repaglinide clearance is significantly reduced in patients with hepatic impairment; the agent should be used with caution in this group.
Cream, Angela; O'Brian, Sue; Jones, Mark; Block, Susan; Harrison, Elisabeth; Lincoln, Michelle; Hewat, Sally; Packman, Ann; Menzies, Ross; Onslow, Mark
2010-08-01
In this study, the authors investigated the efficacy of video self-modeling (VSM) following speech restructuring treatment to improve the maintenance of treatment effects. The design was an open-plan, parallel-group, randomized controlled trial. Participants were 89 adults and adolescents who undertook intensive speech restructuring treatment. Post treatment, participants were randomly assigned to 2 trial arms: standard maintenance and standard maintenance plus VSM. Participants in the latter arm viewed stutter-free videos of themselves each day for 1 month. The addition of VSM did not improve speech outcomes, as measured by percent syllables stuttered, at either 1 or 6 months postrandomization. However, at the latter assessment, self-rating of worst stuttering severity by the VSM group was 10% better than that of the control group, and satisfaction with speech fluency was 20% better. Quality of life was also better for the VSM group, which was mildly to moderately impaired compared with moderate impairment in the control group. VSM intervention after treatment was associated with improvements in self-reported outcomes. The clinical implications of this finding are discussed.
NASA Astrophysics Data System (ADS)
Colojoara, Carmen; Gabay, Shimon; van der Meulen, Freerk W.; van Gemert, Martin J. C.; Miron, Mariana I.; Mavrantoni, Androniki
1997-12-01
Dentin hypersensitivity is considered to be a consequence of the presence of open dentin tubules on the exposed dentin surface. Various methods and materials used in the treatment of this disease are directed to achieve a tubule's occlusion. The purpose of this study was to evaluate under scanning electron microscopy and clinical method the sealing effects of CO2 laser on dentin tubules of human teeth without any damages of the surrounding tissues. Samples of freshly extracted noncarious 3rd molars were used. The teeth were randomly divided in to two groups A and B. The samples of group A were exposed to laser beam in cervical area, directed parallel to their dentin tubules. The teeth of group B were sectioned through a hypothetical carious lesion and lased perpendicularly or obliquely of the dentin tubules. The CO2 laser, at 10.6 micrometers wavelength, was operated only in pulse mode and provided 6.25 - 350 mJ in a burst of 25 pulses each of 250 microsecond(s) time duration with a 2 ms time interval between successive pulses (repetition rate up to 500 mH). Melting of dentin surface and partial closure of exposed dentin tubules were found for all specimens at 6.25 to 31.25 mJ energy. Our results indicated that using CO2 laser in a parallel orientation of laser beam with dentin tubules, the dentin sensitivity can be reduced without any damages of pulp vitality.
NASA Astrophysics Data System (ADS)
Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard
2017-03-01
Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
NASA Astrophysics Data System (ADS)
Colas, Laurent; Lu, Ling-Feng; Křivská, Alena; Jacquot, Jonathan; Hillairet, Julien; Helou, Walid; Goniche, Marc; Heuraux, Stéphane; Faudot, Eric
2017-02-01
We investigate theoretically how sheath radio-frequency (RF) oscillations relate to the spatial structure of the near RF parallel electric field E ∥ emitted by ion cyclotron (IC) wave launchers. We use a simple model of slow wave (SW) evanescence coupled with direct current (DC) plasma biasing via sheath boundary conditions in a 3D parallelepiped filled with homogeneous cold magnetized plasma. Within a ‘wide-sheath’ asymptotic regime, valid for large-amplitude near RF fields, the RF part of this simple RF + DC model becomes linear: the sheath oscillating voltage V RF at open field line boundaries can be re-expressed as a linear combination of individual contributions by every emitting point in the input field map. SW evanescence makes individual contributions all the larger as the wave emission point is located closer to the sheath walls. The decay of |V RF| with the emission point/sheath poloidal distance involves the transverse SW evanescence length and the radial protrusion depth of lateral boundaries. The decay of |V RF| with the emitter/sheath parallel distance is quantified as a function of the parallel SW evanescence length and the parallel connection length of open magnetic field lines. For realistic geometries and target SOL plasmas, poloidal decay occurs over a few centimeters. Typical parallel decay lengths for |V RF| are found to be smaller than IC antenna parallel extension. Oscillating sheath voltages at IC antenna side limiters are therefore mainly sensitive to E ∥ emission by active or passive conducting elements near these limiters, as suggested by recent experimental observations. Parallel proximity effects could also explain why sheath oscillations persist with antisymmetric strap toroidal phasing, despite the parallel antisymmetry of the radiated field map. They could finally justify current attempts at reducing the RF fields induced near antenna boxes to attenuate sheath oscillations in their vicinity.
The specificity of learned parallelism in dual-memory retrieval.
Strobach, Tilo; Schubert, Torsten; Pashler, Harold; Rickard, Timothy
2014-05-01
Retrieval of two responses from one visually presented cue occurs sequentially at the outset of dual-retrieval practice. Exclusively for subjects who adopt a mode of grouping (i.e., synchronizing) their response execution, however, reaction times after dual-retrieval practice indicate a shift to learned retrieval parallelism (e.g., Nino & Rickard, in Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 373-388, 2003). In the present study, we investigated how this learned parallelism is achieved and why it appears to occur only for subjects who group their responses. Two main accounts were considered: a task-level versus a cue-level account. The task-level account assumes that learned retrieval parallelism occurs at the level of the task as a whole and is not limited to practiced cues. Grouping response execution may thus promote a general shift to parallel retrieval following practice. The cue-level account states that learned retrieval parallelism is specific to practiced cues. This type of parallelism may result from cue-specific response chunking that occurs uniquely as a consequence of grouped response execution. The results of two experiments favored the second account and were best interpreted in terms of a structural bottleneck model.
Lee, Mi Young; Choi, Dong Seop; Lee, Moon Kyu; Lee, Hyoung Woo; Park, Tae Sun; Kim, Doo Man; Chung, Choon Hee; Kim, Duk Kyu; Kim, In Joo; Jang, Hak Chul; Park, Yong Soo; Kwon, Hyuk Sang; Lee, Seung Hun; Shin, Hee Kang
2014-01-01
We studied the efficacy and safety of acarbose in comparison with voglibose in type 2 diabetes patients whose blood glucose levels were inadequately controlled with basal insulin alone or in combination with metformin (or a sulfonylurea). This study was a 24-week prospective, open-label, randomized, active-controlled multi-center study. Participants were randomized to receive either acarbose (n=59, 300 mg/day) or voglibose (n=62, 0.9 mg/day). The mean HbA1c at week 24 was significantly decreased approximately 0.7% from baseline in both acarbose (from 8.43% ± 0.71% to 7.71% ± 0.93%) and voglibose groups (from 8.38% ± 0.73% to 7.68% ± 0.94%). The mean fasting plasma glucose level and self-monitoring of blood glucose data from 1 hr before and after each meal were significantly decreased at week 24 in comparison to baseline in both groups. The levels 1 hr after dinner at week 24 were significantly decreased in the acarbose group (from 233.54 ± 69.38 to 176.80 ± 46.63 mg/dL) compared with the voglibose group (from 224.18 ± 70.07 to 193.01 ± 55.39 mg/dL). In conclusion, both acarbose and voglibose are efficacious and safe in patients with type 2 diabetes who are inadequately controlled with basal insulin. (ClinicalTrials.gov number, NCT00970528).
Lee, Mi Young; Lee, Moon Kyu; Lee, Hyoung Woo; Park, Tae Sun; Kim, Doo Man; Chung, Choon Hee; Kim, Duk Kyu; Kim, In Joo; Jang, Hak Chul; Park, Yong Soo; Kwon, Hyuk Sang; Lee, Seung Hun; Shin, Hee Kang
2014-01-01
We studied the efficacy and safety of acarbose in comparison with voglibose in type 2 diabetes patients whose blood glucose levels were inadequately controlled with basal insulin alone or in combination with metformin (or a sulfonylurea). This study was a 24-week prospective, open-label, randomized, active-controlled multi-center study. Participants were randomized to receive either acarbose (n=59, 300 mg/day) or voglibose (n=62, 0.9 mg/day). The mean HbA1c at week 24 was significantly decreased approximately 0.7% from baseline in both acarbose (from 8.43% ± 0.71% to 7.71% ± 0.93%) and voglibose groups (from 8.38% ± 0.73% to 7.68% ± 0.94%). The mean fasting plasma glucose level and self-monitoring of blood glucose data from 1 hr before and after each meal were significantly decreased at week 24 in comparison to baseline in both groups. The levels 1 hr after dinner at week 24 were significantly decreased in the acarbose group (from 233.54 ± 69.38 to 176.80 ± 46.63 mg/dL) compared with the voglibose group (from 224.18 ± 70.07 to 193.01 ± 55.39 mg/dL). In conclusion, both acarbose and voglibose are efficacious and safe in patients with type 2 diabetes who are inadequately controlled with basal insulin. (ClinicalTrials.gov number, NCT00970528) PMID:24431911
Bellamy, Rob; Chilvers, Jason; Vaughan, Naomi E.
2014-01-01
Appraisals of deliberate, large-scale interventions in the earth’s climate system, known collectively as ‘geoengineering’, have largely taken the form of narrowly framed and exclusive expert analyses that prematurely ‘close down’ upon particular proposals. Here, we present the findings from the first ‘upstream’ appraisal of geoengineering to deliberately ‘open up’ to a broader diversity of framings, knowledges and future pathways. We report on the citizen strand of an innovative analytic–deliberative participatory appraisal process called Deliberative Mapping. A select but diverse group of sociodemographically representative citizens from Norfolk (United Kingdom) were engaged in a deliberative multi-criteria appraisal of geoengineering proposals relative to other options for tackling climate change, in parallel to symmetrical appraisals by diverse experts and stakeholders. Despite seeking to map divergent perspectives, a remarkably consistent view of option performance emerged across both the citizens’ and the specialists’ deliberations, where geoengineering proposals were outperformed by mitigation alternatives. PMID:25224904
Bellamy, Rob; Chilvers, Jason; Vaughan, Naomi E
2016-04-01
Appraisals of deliberate, large-scale interventions in the earth's climate system, known collectively as 'geoengineering', have largely taken the form of narrowly framed and exclusive expert analyses that prematurely 'close down' upon particular proposals. Here, we present the findings from the first 'upstream' appraisal of geoengineering to deliberately 'open up' to a broader diversity of framings, knowledges and future pathways. We report on the citizen strand of an innovative analytic-deliberative participatory appraisal process called Deliberative Mapping. A select but diverse group of sociodemographically representative citizens from Norfolk (United Kingdom) were engaged in a deliberative multi-criteria appraisal of geoengineering proposals relative to other options for tackling climate change, in parallel to symmetrical appraisals by diverse experts and stakeholders. Despite seeking to map divergent perspectives, a remarkably consistent view of option performance emerged across both the citizens' and the specialists' deliberations, where geoengineering proposals were outperformed by mitigation alternatives. © The Author(s) 2014.
International developments in openEHR archetypes and templates.
Leslie, Heather
Electronic Health Records (EHRs) are a complex knowledge domain. The ability to design EHRs to cope with the changing nature of health knowledge, and to be shareable, has been elusive. A recent pilot study1 tested the applicability of the CEN 13606 as an electronic health record standard. Using openEHR archetypes and tools2, 650 clinical content specifi cations (archetypes) were created (e.g. for blood pressure) and re-used across all clinical specialties and contexts. Groups of archetypes were aggregated in templates to support clinical information gathering or viewing (e.g. 80 separate archetypes make up the routine antenatal visit record). Over 60 templates were created for use in the emergency department, antenatal care and delivery of an infant, and paediatric hearing loss assessment. The primary goal is to define a logical clinical record architecture for the NHS but potentially, with archetypes as the keystone, shareable EHRs will also be attainable. Archetype and template development work is ongoing, with associated evaluation occurring in parallel.
A feasibility study on porting the community land model onto accelerators using OpenACC
Wang, Dali; Wu, Wei; Winkler, Frank; ...
2014-01-01
As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less
The Particle Accelerator Simulation Code PyORBIT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M
2015-01-01
The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
PRESTO-Tango as an open-source resource for interrogation of the druggable human GPCRome.
Kroeze, Wesley K; Sassano, Maria F; Huang, Xi-Ping; Lansu, Katherine; McCorvy, John D; Giguère, Patrick M; Sciaky, Noah; Roth, Bryan L
2015-05-01
G protein-coupled receptors (GPCRs) are essential mediators of cellular signaling and are important targets of drug action. Of the approximately 350 nonolfactory human GPCRs, more than 100 are still considered to be 'orphans' because their endogenous ligands remain unknown. Here, we describe a unique open-source resource that allows interrogation of the druggable human GPCRome via a G protein-independent β-arrestin-recruitment assay. We validate this unique platform at more than 120 nonorphan human GPCR targets, demonstrate its utility for discovering new ligands for orphan human GPCRs and describe a method (parallel receptorome expression and screening via transcriptional output, with transcriptional activation following arrestin translocation (PRESTO-Tango)) for the simultaneous and parallel interrogation of the entire human nonolfactory GPCRome.
Thunström, Erik; Manhem, Karin; Yucel-Lindberg, Tülay; Rosengren, Annika; Lindberg, Caroline; Peker, Yüksel
2016-11-01
Blood pressure reduction in response to antihypertensive agents is less for patients with obstructive sleep apnea (OSA). Increased sympathetic and inflammatory activity, as well as alterations in the renin-angiotensin-aldosterone system, may play a role in this context. To address the cardiovascular mechanisms involved in response to an angiotensin II receptor antagonist, losartan, and continuous positive airway pressure (CPAP) as add-on treatment for hypertension and OSA. Newly diagnosed hypertensive patients with or without OSA (allocated in a 2:1 ratio for OSA vs. no OSA) were treated with losartan 50 mg daily during a 6-week two-center, open-label, prospective, case-control, parallel-design study. In the second 6-week, sex-stratified, open-label, randomized, parallel-design study, all subjects with OSA continued to receive losartan and were randomly assigned to either CPAP as add-on therapy or to no CPAP (1:1 ratio for CPAP vs. no CPAP). Study subjects without OSA were followed in parallel while they continued to take losartan. Blood samples were collected at baseline, after 6 weeks, and after 12 weeks for analysis of renin, aldosterone, noradrenaline, adrenaline, and inflammatory markers. Fifty-four patients with OSA and 35 without OSA were included in the first 6-week study. Losartan significantly increased renin levels and reduced aldosterone levels in the group without OSA. There was no significant decrease in aldosterone levels among patients with OSA. Add-on CPAP treatment tended to lower aldosterone levels, but reductions were more pronounced in measures of sympathetic activity. No significant changes in inflammatory markers were observed following treatment with losartan and CPAP. Hypertensive patients with OSA responded to losartan treatment with smaller reductions in aldosterone compared with hypertensive patients without OSA. Sympathetic system activity seemed to respond primarily to add-on CPAP treatment in patients with newly discovered hypertension and OSA. Clinical trial registered with www.clinicaltrials.gov (NCT00701428).
Learning and Best Practices for Learning in Open-Source Software Communities
ERIC Educational Resources Information Center
Singh, Vandana; Holt, Lila
2013-01-01
This research is about participants who use open-source software (OSS) discussion forums for learning. Learning in online communities of education as well as non-education-related online communities has been studied under the lens of social learning theory and situated learning for a long time. In this research, we draw parallels among these two…
Kario, Kazuomi; Hoshide, Satoshi
2014-06-01
The ACS1 (Azilsartan Circadian and Sleep Pressure - the first study) is a multicenter, randomized, open-label, two parallel-group study carried out to investigate the efficacy of an 8-week oral treatment with azilsartan 20 mg in comparison with amlodipine 5 mg. The patients with stage I or II primary hypertension will be randomly assigned to either an azilsartan group (n=350) or an amlodipine group (n=350). The primary endpoint is a change in nocturnal systolic blood pressure (BP) as measured by ambulatory BP monitoring at the end of follow-up relative to the baseline level during the run-in period. In addition, we will carry out the same analysis after dividing four different nocturnal BP dipping statuses (extreme-dippers, dippers, nondipper, and risers). The findings of this study will help in establishing an appropriate antihypertensive treatment for hypertensive patients with a disrupted circadian BP rhythm.
Hoshide, Satoshi
2014-01-01
Objective The ACS1 (Azilsartan Circadian and Sleep Pressure – the first study) is a multicenter, randomized, open-label, two parallel-group study carried out to investigate the efficacy of an 8-week oral treatment with azilsartan 20 mg in comparison with amlodipine 5 mg. Materials and methods The patients with stage I or II primary hypertension will be randomly assigned to either an azilsartan group (n=350) or an amlodipine group (n=350). The primary endpoint is a change in nocturnal systolic blood pressure (BP) as measured by ambulatory BP monitoring at the end of follow-up relative to the baseline level during the run-in period. In addition, we will carry out the same analysis after dividing four different nocturnal BP dipping statuses (extreme-dippers, dippers, nondipper, and risers). Conclusion The findings of this study will help in establishing an appropriate antihypertensive treatment for hypertensive patients with a disrupted circadian BP rhythm. PMID:24637789
Novel Door-opening Method for Six-legged Robots Based on Only Force Sensing
NASA Astrophysics Data System (ADS)
Chen, Zhi-Jun; Gao, Feng; Pan, Yang
2017-09-01
Current door-opening methods are mainly developed on tracked, wheeled and biped robots by applying multi-DOF manipulators and vision systems. However, door-opening methods for six-legged robots are seldom studied, especially using 0-DOF tools to operate and only force sensing to detect. A novel door-opening method for six-legged robots is developed and implemented to the six-parallel-legged robot. The kinematic model of the six-parallel-legged robot is established and the model of measuring the positional relationship between the robot and the door is proposed. The measurement model is completely based on only force sensing. The real-time trajectory planning method and the control strategy are designed. The trajectory planning method allows the maximum angle between the sagittal axis of the robot body and the normal line of the door plane to be 45º. A 0-DOF tool mounted to the robot body is applied to operate. By integrating with the body, the tool has 6 DOFs and enough workspace to operate. The loose grasp achieved by the tool helps release the inner force in the tool. Experiments are carried out to validate the method. The results show that the method is effective and robust in opening doors wider than 1 m. This paper proposes a novel door-opening method for six-legged robots, which notably uses a 0-DOF tool and only force sensing to detect and open the door.
PUP: An Architecture to Exploit Parallel Unification in Prolog
1988-03-01
environment stacking mo del similar to the Warren Abstract Machine [23] since it has been shown to be super ior to other known models (see [21]). The storage...execute in groups of independent operations. Unifications belonging to different group s may not overlap. Also unification operations belonging to the...since all parallel operations on the unification units must complete before any of the units can star t executing the next group of parallel
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
Performance of OVERFLOW-D Applications based on Hybrid and MPI Paradigms on IBM Power4 System
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biegel, Bryan (Technical Monitor)
2002-01-01
This report briefly discusses our preliminary performance experiments with parallel versions of OVERFLOW-D applications. These applications are based on MPI and hybrid paradigms on the IBM Power4 system here at the NAS Division. This work is part of an effort to determine the suitability of the system and its parallel libraries (MPI/OpenMP) for specific scientific computing objectives.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Richard; Voter, Arthur; Uberuaga, Bla
2017-10-23
The SpecTAD software represents a refactoring of the Temperature Accelerated Dynamics (TAD2) code authored by Arthur F. Voter and Blas P. Uberuaga (LA-CC-02-05). SpecTAD extends the capabilities of TAD2, by providing algorithms for both temporal and spatial parallelism. The novel algorithms for temporal parallelism include both speculation and replication based techniques. SpecTAD also offers the optional capability to dynamically link to the open-source LAMMPS package.
2012-10-01
using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
Base drive for paralleled inverter systems
NASA Technical Reports Server (NTRS)
Nagano, S. (Inventor)
1980-01-01
In a paralleled inverter system, a positive feedback current derived from the total current from all of the modules of the inverter system is applied to the base drive of each of the power transistors of all modules, thereby to provide all modules protection against open or short circuit faults occurring in any of the modules, and force equal current sharing among the modules during turn on of the power transistors.
Accelerating a three-dimensional eco-hydrological cellular automaton on GPGPU with OpenCL
NASA Astrophysics Data System (ADS)
Senatore, Alfonso; D'Ambrosio, Donato; De Rango, Alessio; Rongo, Rocco; Spataro, William; Straface, Salvatore; Mendicino, Giuseppe
2016-10-01
This work presents an effective implementation of a numerical model for complete eco-hydrological Cellular Automata modeling on Graphical Processing Units (GPU) with OpenCL (Open Computing Language) for heterogeneous computation (i.e., on CPUs and/or GPUs). Different types of parallel implementations were carried out (e.g., use of fast local memory, loop unrolling, etc), showing increasing performance improvements in terms of speedup, adopting also some original optimizations strategies. Moreover, numerical analysis of results (i.e., comparison of CPU and GPU outcomes in terms of rounding errors) have proven to be satisfactory. Experiments were carried out on a workstation with two CPUs (Intel Xeon E5440 at 2.83GHz), one GPU AMD R9 280X and one GPU nVIDIA Tesla K20c. Results have been extremely positive, but further testing should be performed to assess the functionality of the adopted strategies on other complete models and their ability to fruitfully exploit parallel systems resources.
Stanislawski, Larry V.; Survila, Kornelijus; Wendel, Jeffrey; Liu, Yan; Buttenfield, Barbara P.
2018-01-01
This paper describes a workflow for automating the extraction of elevation-derived stream lines using open source tools with parallel computing support and testing the effectiveness of procedures in various terrain conditions within the conterminous United States. Drainage networks are extracted from the US Geological Survey 1/3 arc-second 3D Elevation Program elevation data having a nominal cell size of 10 m. This research demonstrates the utility of open source tools with parallel computing support for extracting connected drainage network patterns and handling depressions in 30 subbasins distributed across humid, dry, and transitional climate regions and in terrain conditions exhibiting a range of slopes. Special attention is given to low-slope terrain, where network connectivity is preserved by generating synthetic stream channels through lake and waterbody polygons. Conflation analysis compares the extracted streams with a 1:24,000-scale National Hydrography Dataset flowline network and shows that similarities are greatest for second- and higher-order tributaries.
Performance evaluation of OpenFOAM on many-core architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.
Spacer grid assembly and locking mechanism
Snyder, Jr., Harold J.; Veca, Anthony R.; Donck, Harry A.
1982-01-01
A spacer grid assembly is disclosed for retaining a plurality of fuel rods in substantially parallel spaced relation, the spacer grids being formed with rhombic openings defining contact means for engaging from one to four fuel rods arranged in each opening, the spacer grids being of symmetric configuration with their rhombic openings being asymmetrically offset to permit inversion and relative rotation of the similar spacer grids for improved support of the fuel rods. An improved locking mechanism includes tie bars having chordal surfaces to facilitate their installation in slotted circular openings of the spacer grids, the tie rods being rotatable into locking engagement with the slotted openings.
Trainer, Peter J; Newell-Price, John; Ayuk, John; Aylwin, Simon; Rees, D Aled; Drake, Wm; Chanson, Philippe; Brue, Thierry; Webb, Susan M; Montañana, Carmen Fajardo; Aller, Javier; McCormack, Ann I; Torpy, David J; Tachas, George; Atley, Lynne; Ryder, David; Bidlingmaier, Martin
2018-05-22
ATL1103 is a second-generation antisense oligomer targeting the human GH receptor. This phase 2 randomised, open-label, parallel-group study assessed the potential of ATL1103 as a treatment for acromegaly. 26 patients with active acromegaly (IGF-I >130% upper limit of normal) were randomised to subcutaneous ATL1103 200 mg either once- or twice-weekly for 13 weeks, and monitored for a further 8-week washout period. The primary efficacy measures were change in IGF-I at week 14, compared to baseline and between cohorts. For secondary endpoints (IGFBP3, ALS, GH, GHBP), comparison was between baseline and week 14. Safety was assessed by reported adverse events. Baseline median IGF-I was 447 and 649 ng/mL in the once- and twice-weekly groups, respectivey. Compared to baseline, at week 14 twice-weekly ATL1103 resulted in a median fall in IGF-I of 27.8% (p=0.0002). Between cohort comparison at week 14 demonstrated the median fall in IGF-I to be 25.8% (p=0.0012) greater with twice-weekly dosing. In the twice-weekly cohort, IGF-I was still declining at week 14, and at week 21 remained lower than at baseline by a median of 18.7% (p=0.0005). Compared to baseline, by week 14 IGFBP3 and ALS had declined by a median of 8.9% (p=0.027) and 16.7% (p=0.017) with twice-weekly ATL1103; GH had increased by a median of 46% at week 14 (p=0.001). IGFBP3, ALS and GH did not change with weekly ATL1103. GHBP fell by a median of 23.6% and 48.8% in the once- and twice-weekly cohorts (p=0.027 and p=0.005), respectively. ATL1103 was well tolerated, although 84.6% of patients experienced mild to moderate injection-site reactions (ISR). This study provides proof-of-concept that ATL1103 is able to significantly lower IGF-I in patients with acromegaly.
Haziza, Christelle; de La Bourdonnaye, Guillaume; Skiada, Dimitra; Ancerewicz, Jacek; Baker, Gizelle; Picavet, Patrick; Lüdicke, Frank
2016-11-30
The Tobacco Heating System (THS) 2.2, a candidate Modified Risk Tobacco Product (MRTP), is designed to heat tobacco without burning it. Tobacco is heated in order to reduce the formation of harmful and potentially harmful constituents (HPHC), and reduce the consequent exposure, compared with combustible cigarettes (CC). In this 5-day exposure, controlled, parallel-group, open-label clinical study, 160 smoking, healthy subjects were randomized to three groups and asked to: (1) switch from CCs to THS 2.2 (THS group; 80 participants); (2) continue to use their own non-menthol CC brand (CC group; 41 participants); or (3) to refrain from smoking (smoking abstinence (SA) group; 39 participants). Biomarkers of exposure, except those associated with nicotine exposure, were significantly reduced in the THS group compared with the CC group, and approached the levels observed in the SA group. Increased product consumption and total puff volume were reported in the THS group. However, exposure to nicotine was similar to CC at the end of the confinement period. Reduction in urge-to-smoke was comparable between the THS and CC groups and THS 2.2 product was well tolerated. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Xia, Yidong; Lou, Jialin; Luo, Hong; ...
2015-02-09
Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less
Concurrent computation of attribute filters on shared memory parallel machines.
Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold
2008-10-01
Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Massively parallel sparse matrix function calculations with NTPoly
NASA Astrophysics Data System (ADS)
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
Kusawake, Tomohiro; Kowalski, Donna; Takada, Akitsugu; Kato, Kota; Katashima, Masataka; Keirns, James J; Lewand, Michaelene; Lasseter, Kenneth C; Marbury, Thomas C; Preston, Richard A
2017-12-01
Amenamevir (ASP2151) is a nonnucleoside human herpesvirus helicase-primase inhibitor that was approved in Japan for the treatment of herpes zoster (shingles) in 2017. This article reports the results of two clinical trials that investigated the effects of renal and hepatic impairment on the pharmacokinetics of amenamevir. These studies were phase 1, open-label, single-dose (oral 400 mg), parallel-group studies evaluating the pharmacokinetics, safety, and tolerability of amenamevir in healthy participants and participants with moderate hepatic impairment and mild, moderate, and severe renal impairment. In the hepatic impairment study, the pharmacokinetic profile of amenamevir in participants with moderate hepatic impairment was generally similar to that of participants with normal hepatic function. In the renal impairment study, the area under the amenamevir concentration versus time curve from the time of dosing up to the time of the last sample with extrapolation to infinity of the terminal phase was increased by 78.1% in participants with severe renal impairment. There was a positive relationship between creatinine clearance and oral and renal clearance for amenamevir in the renal impairment study. In both studies, amenamevir was safe and well tolerated. The findings of the hepatic impairment study indicate that no dosing adjustment is required in patients with moderate hepatic impairment. In the renal impairment study, systemic amenamevir exposure was increased by renal impairment. However, it is unlikely that renal impairment will have a significant effect on the safety of amenamevir given that in previous pharmacokinetic and safety studies in healthy individuals amenamevir was safe and well tolerated after a single dose (5-2400 mg, fasted condition) and repeated doses for 7 days (300 or 600 mg, fed condition), and the amount of amenamevir exposure in the renal impairment study was covered by those studies. These findings suggest that amenamevir does not require dosage reduction in accordance with the creatinine clearance FUNDING: Astellas Pharma.
Bernardo-Escudero, Roberto; Alonso-Campero, Rosalba; Francisco-Doce, María Teresa de Jesús; Cortés-Fuentes, Myriam; Villa-Vargas, Miriam; Angeles-Uribe, Juan
2012-12-01
The study aimed to assess the pharmacokinetics of a new, modified-release metoclopramide tablet, and compare it to an immediate-release tablet. A single and multiple-dose, randomized, open-label, parallel, pharmacokinetic study was conducted. Investigational products were administered to 26 healthy Hispanic Mexican male volunteers for two consecutive days: either one 30 mg modified-release tablet every 24 h, or one 10 mg immediate-release tablet every 8 h. Blood samples were collected after the first and last doses of metoclopramide. Plasma metoclopramide concentrations were determined by high-performance liquid chromatography. Safety and tolerability were assessed through vital signs measurements, clinical evaluations, and spontaneous reports from study subjects. All 26 subjects were included in the analyses [mean (SD) age: 27 (8) years, range 18-50; BMI: 23.65 (2.22) kg/m², range 18.01-27.47)]. Peak plasmatic concentrations were not statistically different with both formulations, but occurred significantly later (p < 0.05) with the modified-release form [tmax: 3.15 (1.28) vs. 0.85 (0.32) h and tmax-ss: 2.92 (1.19) vs. 1.04 (0.43) h]. There was no difference noted in the average plasma concentrations [Cavgτ: 23.90 (7.90) vs. 20.64 (7.43) ng/mL after the first dose; and Cavg-ss: 31.14 (9.64) vs. 35.59 (12.29) ng/mL after the last dose, (p > 0.05)]. One adverse event was reported in the test group (diarrhea), and one in the reference group (headache). This study suggests that the 30 mg modified-release metoclopramide tablets show features compatible with slow-release formulations when compared to immediate-release tablets, and is suitable for once-a-day administration.
Yassin, Hany Mahmoud; Abdel Moneim, Ahmed Tohamy; Mostafa Bayoumy, Ahmed Sherin; Bayoumy, Hasan Metwally; Taher, Sameh Galal
2017-01-01
The use of succinylcholine for rapid sequence induction in patients with open globe injuries may be detrimental to the eye. The aim of this study is to determine if the premedication with magnesium sulfate (MgSO 4 ) could attenuate the increase in intraocular pressure (IOP) associated with succinylcholine injection and intubation. Operation theaters in a tertiary care University Hospital between December 2014 and July 215. This was a prospective, randomized, parallel three-arm, double-blind, placebo-controlled clinical trial. One hundred and thirteen patients' physical status ASA Classes I and II underwent elective cataract surgery under general anesthesia. These patients allocated into three groups: Group C (control group) received 100 ml normal saline, Group M1 received 30 mg/kg MgSO 4 in 100 ml normal saline, and Group M2 received 50 mg/kg MgSO 4 in 100 ml normal saline. IOP, mean arterial pressure (MAP), and heart rate (HR) reported at 5-time points related to study drug administration. In addition, any adverse effects related to MgSO 4 were recorded. Intragroup and between-groups differences were examined by analysis of variance test. We noticed a significant decrease in IOP in M1 ( n = 38) and M2 ( n = 37) groups as compared with C group ( n = 38) after study drugs infusion, 2 and 5 min after intubation, P < 0.001. While the difference between M1 and M2 groups was insignificant, P = 0.296 and P = 0.647, respectively. There was a significant decrease in MAP and HR in M1 and M2 groups as compared with C group 2 and 5 min after intubation, P = 0.01. While the difference between M1 and M2 groups was insignificant, P = 1. MgSO 4 30 mg/kg as well as 50 mg/kg effectively prevented the rise in IOP, MAP, and HR associated with rapid sequence induction by succinylcholine and endotracheal intubation.
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
CCMC Modeling of Magnetic Reconnection in Electron Diffusion Region Events
NASA Astrophysics Data System (ADS)
Marshall, A.; Reiff, P. H.; Daou, A.; Webster, J.; Sazykin, S. Y.; Kuznetsova, M.; Grocer, A.; Rastaetter, L.; Welling, D. T.; DeZeeuw, D.; Russell, C. T.
2017-12-01
We use the unprecedented spatial and temporal cadence of the Magnetospheric Multiscale Mission to study four electron diffusion events, and infer important physical properties of their respective magnetic reconnection processes. We couple these observations with numerical simulations using tools such as SWMF with RCM, and RECON-X, from the Coordinated Community Modeling Center, to provide, for a first time, a coherent temporal description of the magnetic reconnection process through tracing the coupling of IMF and closed Earth magnetic field lines, leading to the corresponding polar cap open field lines. We note that the reconnection geometry is far from slab-like: the IMF field lines drape over the magnetopause, lending to a stretching of the field lines. The stretched field lines become parallel to, and merge with the dayside separator. Surprisingly, the inner closed field lines also distort to become parallel to the separator. This parallel geometry allows a very sharp boundary between open and closed field lines. In three of the events, the MMS location was near the predicted separator location; in the fourth it was near the outflow region.
Podoleanu, Adrian Gh; Bradu, Adrian
2013-08-12
Conventional spectral domain interferometry (SDI) methods suffer from the need of data linearization. When applied to optical coherence tomography (OCT), conventional SDI methods are limited in their 3D capability, as they cannot deliver direct en-face cuts. Here we introduce a novel SDI method, which eliminates these disadvantages. We denote this method as Master - Slave Interferometry (MSI), because a signal is acquired by a slave interferometer for an optical path difference (OPD) value determined by a master interferometer. The MSI method radically changes the main building block of an SDI sensor and of a spectral domain OCT set-up. The serially provided signal in conventional technology is replaced by multiple signals, a signal for each OPD point in the object investigated. This opens novel avenues in parallel sensing and in parallelization of signal processing in 3D-OCT, with applications in high- resolution medical imaging and microscopy investigation of biosamples. Eliminating the need of linearization leads to lower cost OCT systems and opens potential avenues in increasing the speed of production of en-face OCT images in comparison with conventional SDI.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Tanaka, Kenichi; Nakayama, Masaaki; Kanno, Makoto; Kimura, Hiroshi; Watanabe, Kimio; Tani, Yoshihiro; Hayashi, Yoshimitsu; Asahi, Koichi; Terawaki, Hiroyuki; Watanabe, Tsuyoshi
2015-12-01
Hyperuricemia is associated with the onset of chronic kidney disease (CKD) and renal disease progression. Febuxostat, a novel, non-purine, selective xanthine oxidase inhibitor, has been reported to have a stronger effect on hyperuricemia than conventional therapy with allopurinol. However, few data are available regarding the clinical effect of febuxostat in patients with CKD. A prospective, randomized, open-label, parallel-group trial was conducted in hyperuricemic patients with stage 3 CKD. Patients were randomly assigned to treatment with febuxostat (n = 21) or to continue conventional therapy (n = 19). Treatment was continued for 12 weeks. The efficacy of febuxostat was determined by monitoring serum uric acid (UA) levels, blood pressures, renal function, and urinary protein levels. In addition, urinary liver-type fatty acid-binding protein (L-FABP), urinary albumin, urinary beta 2 microglobulin (β2MG), and serum high sensitivity C-reactive protein were measured before and 12 weeks after febuxostat was added to the treatment. Febuxostat resulted in a significantly greater reduction in serum UA (-2.2 mg/dL) than conventional therapy (-0.3 mg/dL, P < 0.001). Serum creatinine and estimated glomerular filtration rate changed little during the study period in each group. However, treatment with febuxostat for 12 weeks reduced the urinary levels of L-FABP, albumin, and β2MG, whereas the levels of these markers did not change in the control group. Febuxostat reduced serum UA levels more effectively than conventional therapy and might have a renoprotective effect in hyperuricemic patients with CKD. Further studies should clarify whether febuxostat prevents the progression of renal disease and improves the prognosis of CKD.
Veleba, Jiri; Matoulek, Martin; Hill, Martin; Pelikanova, Terezie; Kahleova, Hana
2016-10-26
It has been shown that it is possible to modify macronutrient oxidation, physical fitness and resting energy expenditure (REE) by changes in diet composition. Furthermore, mitochondrial oxidation can be significantly increased by a diet with a low glycemic index. The purpose of our trial was to compare the effects of a vegetarian (V) and conventional diet (C) with the same caloric restriction (-500 kcal/day) on physical fitness and REE after 12 weeks of diet plus aerobic exercise in 74 patients with type 2 diabetes (T2D). An open, parallel, randomized study design was used. All meals were provided for the whole study duration. An individualized exercise program was prescribed to the participants and was conducted under supervision. Physical fitness was measured by spiroergometry and indirect calorimetry was performed at the start and after 12 weeks Repeated-measures ANOVA (Analysis of variance) models with between-subject (group) and within-subject (time) factors and interactions were used for evaluation of the relationships between continuous variables and factors. Maximal oxygen consumption (VO 2max ) increased by 12% in vegetarian group (V) (F = 13.1, p < 0.001, partial η ² = 0.171), whereas no significant change was observed in C (F = 0.7, p = 0.667; group × time F = 9.3, p = 0.004, partial η ² = 0.209). Maximal performance (Watt max) increased by 21% in V (F = 8.3, p < 0.001, partial η ² = 0.192), whereas it did not change in C (F = 1.0, p = 0.334; group × time F = 4.2, p = 0.048, partial η ² = 0.116). Our results indicate that V leads more effectively to improvement in physical fitness than C after aerobic exercise program.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer
This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Kasichayanula, Sreeneeranj; Liu, Xiaoni; Zhang, Weijiang; Pfister, Marc; LaCreta, Frank P; Boulton, David W
2011-11-01
Dapagliflozin, a selective inhibitor of renal sodium glucose co-transporter 2, is under development for the treatment of type 2 diabetes mellitus. Dapagliflozin elimination is primarily via glucuronidation to an inactive metabolite, dapagliflozin 3-O-glucuronide. Pharmacokinetic studies are recommended in subjects with impaired hepatic function if hepatic metabolism accounts for a substantial portion of the absorbed drug. The purpose of our study was to compare the pharmacokinetics of dapagliflozin in patients with mild, moderate, or severe hepatic impairment (HI) with healthy subjects. This was an open-label, parallel-group study in male or female patients with mild, moderate, or severe HI (6 per group according to Child-Pugh classification) and in 6 healthy control subjects. The control subjects were matched to the combined HI group for age (±10 years), weight (±20%), sex, and smoking status, with no deviations from normal in medical history, physical examination, ECG, or laboratory determinations. All participants received a single 10-mg oral dose of dapagliflozin, and the pharmacokinetics of dapagliflozin and dapagliflozin 3-O-glucuronide were characterized. Dapagliflozin tolerability was also assessed throughout the study. Demographic characteristics and baseline physical measurements (weight, height, and body mass index) were similar among the 18 patients in the HI groups (58-126 kg; 151.2-190.0 cm, and 31.5-37.7 kg/m(2), respectively) and the healthy subject group (65.0-102.6 kg; 166.0-184.0 cm, and 23.3-34.3 kg/m(2), respectively). In those with mild, moderate, or severe HI, dapagliflozin mean C(max) values were 12% lower and 12% and 40% higher than healthy subjects, respectively. Mean dapagliflozin AUC(0-∞) values were 3%, 36%, and 67% higher compared with healthy subjects, respectively. Dapagliflozin 3-O-glucuronide mean C(max) values were 4% and 58% higher and 14% lower in those with mild, moderate, or severe HI compared with healthy subjects, respectively, and mean dapagliflozin 3-O-glucuronide AUC(0-∞) values were 6%, 100%, and 30% higher compared with healthy subjects, respectively. These values were highly dependent on the calculated creatinine clearance of each group. All adverse events were mild or moderate, with no imbalance in frequency between groups. Compared with healthy subjects, systemic exposure to dapagliflozin in subjects with HI was correlated with the degree of HI. Single 10-mg doses of dapagliflozin were generally well tolerated by participants in this study. Due to the higher dapagliflozin exposures in patients with severe HI, the benefit:risk ratio should be individually assessed because the long-term safety profile and efficacy of dapagliflozin have not been specifically studied in this population. Copyright © 2011 Elsevier HS Journals, Inc. All rights reserved.
Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision
ERIC Educational Resources Information Center
Hinkle, Michelle Gimenez
2008-01-01
This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.
Pandey, Suresh K; Werner, Liliana; Wilson, M Edward; Izak, Andrea M; Apple, David J
2004-10-01
To compare the amount of capsulorhexis ovaling and capsular bag stretch produced by various intraocular lenses (IOLs) implanted in pediatric human eyes obtained post-mortem. David J. Apple, MD Laboratories for Ophthalmic Devices Research, John A. Moran Eye Center, Salt Lake City, Utah, USA. In this nonrandomized comparative study, 16 pediatric human eyes obtained postmortem were divided into 2 groups: Eight eyes were obtained from children younger than 2 years (Group A), and 8 eyes were obtained from children older than 2 years (Group B). All eyes were prepared according to the Miyake-Apple posterior video technique. Six types of rigid and foldable posterior chamber IOLs manufactured from poly(methyl methacrylate) (single-piece), silicone (plate and loop haptics), and hydrophobic acrylic (single-piece and 3-piece AcrySof, Alcon Laboratories) biomaterials were implanted. The capsulorhexis opening and capsular bag diameters were measured before IOL implantation and after in-the-bag IOL fixation with the haptics (or the main axis) at the 3 to 9 o'clock meridian. The percentage of ovaling of the capsulorhexis opening was calculated by noting the difference in the opening's horizontal diameter before and after IOL implantation. The percentage of capsular bag stretch was also calculated by noting the difference in the horizontal capsular bag diameter before and after IOL implantation. All IOLs produced ovaling of the capsulorhexis opening and stretching of the capsular bag parallel to the IOL haptics. There were significant differences in capsulorhexis ovaling and capsular bag stretch (P<.001, analysis of variance) between the 6 IOL types in each group of eyes. The postimplantation difference was significant only between the single-piece hydrophobic acrylic IOL (AcrySof) and the other IOLs. The single-piece hydrophobic acrylic IOL was associated with significantly less capsulorhexis ovaling and capsular bag stretch in both groups (mean 12.06% +/- 0.59% [SD] and 7.6% +/- 1.47%, respectively). Modern rigid and foldable IOLs designed for the adult population implanted in the capsular bag of infants and children produced variable degrees of capsulorhexis ovaling and capsular bag stretch. The Miyake-Apple posterior video technique confirmed the well-maintained configuration of the capsular bag (with minimal ovaling) after implantation of a single-piece hydrophobic acrylic IOL because of its flexible haptic design.
Determination of Algorithm Parallelism in NP Complete Problems for Distributed Architectures
1990-03-05
12 structure STACK declare OpenStack (S-.NODE **TopPtr) -+TopPtrI FlushStack(S.-NODE **TopPtr) -*TopPtr PushOnStack(S-.NODE **TopPtr, ITEM *NewltemPtr...OfCoveringSets, CoveringSets, L, Best CoverTime, Vertex, Set3end SCND ADT B.26 structure STACKI declare OpenStack (S-NODE **TopPtr) -+TopPtr FlushStack(S
Solid state pulsed power generator
Tao, Fengfeng; Saddoughi, Seyed Gholamali; Herbon, John Thomas
2014-02-11
A power generator includes one or more full bridge inverter modules coupled to a semiconductor opening switch (SOS) through an inductive resonant branch. Each module includes a plurality of switches that are switched in a fashion causing the one or more full bridge inverter modules to drive the semiconductor opening switch SOS through the resonant circuit to generate pulses to a load connected in parallel with the SOS.
Izgu, Nur; Ozdemir, Leyla; Bugdayci Basal, Fatma
2017-12-02
Patients receiving oxaliplatin may experience peripheral neuropathic pain and fatigue. Aromatherapy massage, a nonpharmacological method, may help to control these symptoms. The aim of this open-label, parallel-group, quasi-randomized controlled pilot study was to investigate the effect of aromatherapy massage on chemotherapy-induced peripheral neuropathic pain and fatigue in patients receiving oxaliplatin. Stratified randomization was used to allocate 46 patients to 2 groups: intervention (n = 22) and control (n = 24). Between week 1 and week 6, participants in the intervention group (IG) received aromatherapy massage 3 times a week. There was no intervention in weeks 7 and 8. The control group (CG) received routine care. Neuropathic pain was identified using the Douleur Neuropathique 4 Questions; severity of painful paresthesia was assessed with the numerical rating scale; fatigue severity was identified with the Piper Fatigue Scale. At week 6, the rate of neuropathic pain was significantly lower in the IG, when compared with the CG. The severity of painful paresthesia based on numerical rating scale in the IG was significantly lower than that in the CG at weeks 2, 4, and 6. At week 8, fatigue severity in the IG was significantly lower when compared with CG (P < .05). Aromatherapy massage may be useful in the management of chemotherapy-induced peripheral neuropathic pain and fatigue. This pilot study suggests that aromatherapy massage may be useful to relieve neuropathic pain and fatigue. However, there is a need for further clinical trials to validate the results of this study.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.
2011-07-01
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.
2016-12-01
The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
DNA Assembly with De Bruijn Graphs Using an FPGA Platform.
Poirier, Carl; Gosselin, Benoit; Fortier, Paul
2018-01-01
This paper presents an FPGA implementation of a DNA assembly algorithm, called Ray, initially developed to run on parallel CPUs. The OpenCL language is used and the focus is placed on modifying and optimizing the original algorithm to better suit the new parallelization tool and the radically different hardware architecture. The results show that the execution time is roughly one fourth that of the CPU and factoring energy consumption yields a tenfold savings.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
NASA Astrophysics Data System (ADS)
Endo, M.; Hori, T.; Koyama, K.; Yamaguchi, I.; Arai, K.; Kaiho, K.; Yanabu, S.
2008-02-01
Using a high temperature superconductor, we constructed and tested a model Superconducting Fault Current Limiter (SFCL). SFCL which has a vacuum interrupter with electromagnetic repulsion mechanism. We set out to construct high voltage class SFCL. We produced the electromagnetic repulsion switch equipped with a 24kV vacuum interrupter(VI). There are problems that opening speed becomes late. Because the larger vacuum interrupter the heavier weight of its contact. For this reason, the current which flows in a superconductor may be unable to be interrupted within a half cycles of current. In order to solve this problem, it is necessary to change the design of the coil connected in parallel and to strengthen the electromagnetic repulsion force at the time of opening the vacuum interrupter. Then, the design of the coil was changed, and in order to examine whether the problem is solvable, the current limiting test was conducted. We examined current limiting test using 4 series and 2 parallel-connected YBCO thin films. We used 12-centimeter-long YBCO thin film. The parallel resistance (0.1Ω) is connected with each YBCO thin film. As a result, we succeed in interrupting the current of superconductor within a half cycle of it. Furthermore, series and parallel-connected YBCO thin film could limit without failure.
Efficient Scalable Median Filtering Using Histogram-Based Operations.
Green, Oded
2018-05-01
Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
Xu, Songao; Yu, Huijie; Sun, Hui; Zhu, Xiangyun; Xu, Xiaoqin; Xu, Jun; Cao, Weizhong
2017-01-01
To investigate the efficiency of closed tracheal suction system (CTSS) using novel splash-proof ventilator circuit component on ventilator-associated pneumonia (VAP) and the colonization of multiple-drug resistant bacteria (MDR) in patients undergoing mechanical ventilation (MV) prevention. A prospective single-blinded randomized parallel controlled intervention study was conducted. 330 severe patients admitted to the intensive care unit (ICU) of the First Hospital of Jiaxing from January 2014 to May 2016 were enrolled, and they were divided into open tracheal suction group, closed tracheal suction group, and splash-proof suction group on average by random number table. The patients in the three groups used conventional ventilator circuit component, conventional CTSS, and CTSS with a novel splash-proof ventilator circuit component for MV and sputum suction, respectively. The incidence of VAP, airway bacterial colonization rate, MDR and fungi colonization rate, duration of MV, length of ICU and hospitalization stay, and financial expenditure during hospitalization, as well as the in-hospital prognosis were recorded. After excluding patients who did not meet the inclusion criteria, incomplete data, backed out and so on, 318 patients were enrolled in the analysis finally. Compared with the open tracheal suction group, the total incidence of VAP was decreased in the closed tracheal suction group and splash-proof suction group [20.95% (22/105), 21.90% (23/105) vs. 29.63% (32/108)], but no statistical difference was found (both P > 0.05), and the incidence of VAP infections/1 000 MV days showed the same change tendency (cases: 14.56, 17.35 vs. 23.07). The rate of airway bacterial colonization and the rate of MDR colonization in the open tracheal suction group and splash-proof suction group were remarkably lower than those of closed tracheal suction group [32.41% (35/108), 28.57% (30/105) vs. 46.67% (49/105), 20.37% (22/108), 15.24% (16/105) vs. 39.05% (41/105)] with significantly statistical differences (all P < 0.05). Besides, no significantly statistical difference was found in the fungi colonization rate among open tracheal group, closed tracheal group, and splash-proof suction group (4.63%, 3.81% and 6.67%, respectively, P > 0.05). Compared with the closed tracheal suction group, the duration of MV, the length of ICU and hospitalization stay were shortened in the open tracheal suction group and splash-proof suction group [duration of MV (days): 8.00 (4.00, 13.75), 8.00 (5.00, 13.00) vs. 9.00 (5.00, 16.00); the length of ICU stay (days): 10.00 (6.00, 16.00), 11.00 (7.00, 19.00) vs. 13.00 (7.50, 22.00); the length of hospitalization stay (days): 16.50 (9.25, 32.00), 19.00 (10.50, 32.50) vs. 21.00 (10.00, 36.00)], and financial expenditure during hospitalization was lowered [10 thousand Yuan: 4.95 (3.13, 8.62), 5.47 (3.84, 9.41) vs. 6.52 (3.99, 11.02)] without significantly statistical differences (all P > 0.05). Moreover, no significantly statistical difference was found in the in-hospital prognosis among the three groups. CTSS performed using novel splash-proof ventilator circuit component shared similar advantages in preventing VAP with the conventional CTSS. Meanwhile, it is superior because it prevented the colonization of MDR and high price in the conventional CTSS.Clinical Trail Registration Chinese Clinical Trial Registry, ChiCTR-IOR-16009694.
Darensbourg, Donald J; Mackiewicz, Ryan M; Rodgers, Jody L; Fang, Cindy C; Billodeaux, Damon R; Reibenspies, Joseph H
2004-09-20
A detailed mechanistic study into the copolymerization of CO2 and cyclohexene oxide utilizing CrIII(salen)X complexes and N-methylimidazole, where H2salen = N,N'-bis(3,5-di-tert-butylsalicylidene)-1,2-ethylenediimine and other salen derivatives and X = Cl or N3, has been conducted. By studying salen ligands with various groups on the diimine backbone, we have observed that bulky groups oriented perpendicular to the salen plane reduce the activity of the catalyst significantly, while such groups oriented parallel to the salen plane do not retard copolymer formation. This is not surprising in that the mechanism for asymmetric ring opening of epoxides was found to occur in a bimetallic fashion, whereas these perpendicularly oriented groups along with the tert-butyl groups on the phenolate rings produce considerable steric requirements for the two metal centers to communicate and thus initiate the copolymerization process. It was also observed that altering the substituents on the phenolate rings of the salen ligand had a 2-fold effect, controlling both catalyst solubility as well as electron density around the metal center, producing significant effects on the rate of copolymer formation. This and other data discussed herein have led us to propose a more detailed mechanistic delineation, wherein the rate of copolymerization is dictated by two separate equilibria. The first equilibrium involves the initial second-order epoxide ring opening and is inhibited by excess amounts of cocatalyst. The second equilibrium involves the propagation step and is enhanced by excess cocatalyst. This gives the [cocatalyst] both a positive and negative effect on the overall rate of copolymerization. Copyright 2004 American Chemical Society
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panther, Jennifer L.; Brown, Richard S.; Gaulke, Greggory L.
2010-05-11
In this study, conducted by Pacific Northwest National Laboratory for the U.S. Army Corps of Engineers, Portland District, we measured differences in survival and growth, incision openness, transmitter loss, wound healing, and erythema among abdominal incisions on the linea alba, lateral and parallel to the linea alba (muscle-cutting), and following the underlying muscle fibers (muscle-sparing). A total of 936 juvenile Chinook salmon were implanted with both Juvenile Salmon Acoustic Tracking System transmitters (0.43 g dry) and passive integrated transponder tags. Fish were held at 12°C (n = 468) or 20°C (n = 468) and examined once weekly over 98 days.more » We found survival and growth did not differ among incision groups or between temperature treatment groups. Incisions on the linea alba had less openness than muscle-cutting and muscle-sparing incisions during the first 14 days when fish were held at 12°C or 20°C. Transmitter loss was not different among incision locations by day 28 when fish were held at 12°C or 20°C. However, incisions on the linea alba had greater transmitter loss than muscle-cutting and muscle-sparing incisions by day 98 at 12°C. Results for wound closure and erythema differed among temperature groups. Results from our study will be used to improve fish-tagging procedures for future studies using acoustic or radio transmitters.« less
Systems and methods for photovoltaic string protection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krein, Philip T.; Kim, Katherine A.; Pilawa-Podgurski, Robert C. N.
A system and method includes a circuit for protecting a photovoltaic string. A bypass switch connects in parallel to the photovoltaic string and a hot spot protection switch connects in series with the photovoltaic string. A first control signal controls opening and closing of the bypass switch and a second control signal controls opening and closing of the hot spot protection switch. Upon detection of a hot spot condition the first control signal closes the bypass switch and after the bypass switch is closed the second control signal opens the hot spot protection switch.
Brainhack: a collaborative workshop for the open neuroscience community.
Cameron Craddock, R; S Margulies, Daniel; Bellec, Pierre; Nolan Nichols, B; Alcauter, Sarael; A Barrios, Fernando; Burnod, Yves; J Cannistraci, Christopher; Cohen-Adad, Julien; De Leener, Benjamin; Dery, Sebastien; Downar, Jonathan; Dunlop, Katharine; R Franco, Alexandre; Seligman Froehlich, Caroline; J Gerber, Andrew; S Ghosh, Satrajit; J Grabowski, Thomas; Hill, Sean; Sólon Heinsfeld, Anibal; Matthew Hutchison, R; Kundu, Prantik; R Laird, Angela; Liew, Sook-Lei; J Lurie, Daniel; G McLaren, Donald; Meneguzzi, Felipe; Mennes, Maarten; Mesmoudi, Salma; O'Connor, David; H Pasaye, Erick; Peltier, Scott; Poline, Jean-Baptiste; Prasad, Gautam; Fraga Pereira, Ramon; Quirion, Pierre-Olivier; Rokem, Ariel; S Saad, Ziad; Shi, Yonggang; C Strother, Stephen; Toro, Roberto; Q Uddin, Lucina; D Van Horn, John; W Van Meter, John; C Welsh, Robert; Xu, Ting
2016-01-01
Brainhack events offer a novel workshop format with participant-generated content that caters to the rapidly growing open neuroscience community. Including components from hackathons and unconferences, as well as parallel educational sessions, Brainhack fosters novel collaborations around the interests of its attendees. Here we provide an overview of its structure, past events, and example projects. Additionally, we outline current innovations such as regional events and post-conference publications. Through introducing Brainhack to the wider neuroscience community, we hope to provide a unique conference format that promotes the features of collaborative, open science.
Ali, Mohammed K; Amin, Maggy E; Amin, Ahmed F; Abd El Aal, Diaa Eldeen M
2017-03-01
To test the effect of aspirin and omega 3 on fetal weight as well as feto-maternal blood flow in asymmetrical intrauterine growth restriction (IUGR). This study is a clinically registered (NCT02696577), open, parallel, randomized controlled trial, conducted at Assiut Woman's Health Hospital, Egypt including 80 pregnant women (28-30 weeks) with IUGR. They were randomized either to group I: aspirin or group II: aspirin plus omega 3. The primary outcome was the fetal weight after 6 weeks of treatment. Secondary outcomes included Doppler blood flow changes in both uterine and umbilical arteries, birth weight, time and method of delivery and admission to NICU. The outcome variables were analyzed using paired and unpaired t-test. The estimated fetal weight increased significant in group II more than group I (p=0.00). The uterine and umbilical arteries blood flow increased significantly in group II (p<0.05). The birth weight in group II was higher than that observed in group I (p<0.05). The using of aspirin with omega 3 is more effective than using aspirin only in increasing fetal weight and improving utero-placental blood flow in IUGR. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
de Luis, D A; Izaola, O; de la Fuente, B; Terroba, M C; Cuellar, L; Cabezas, G
2013-06-01
The aim of our study was to investigate whether two different daily doses of a high monounsaturated fatty acid (MUFA) specific diabetes enteral formula could improve nutritional variables as well as metabolic parameters. We conducted a randomized, open-label, multicenter, parallel group study. 27 patients with diabetes mellitus type 2 with recent weight loss were randomized to one of two study groups: group 1 (two cans per day) and group 2 (three cans per day) for a ten week period. A significative decrease of HbA1c was detected in both groups. The decrease 0.98% (confidence interval 95% 0.19-1.88) was higher in group 2 than group 1 0.60% (confidence interval 95% 0.14-1.04). A significant increase of weight, body mass index, fat mass, albumin, prealbumin and transferrin was observed in both groups without statistical differences in this improvement between both groups. The increase of weight 4.59kg (confidence interval 95% 1.71-9.49) was higher in group 2 than group 1 1.46% (confidence interval 95% 0.39-2.54). Gastrointestinal tolerance (diarrhea episodes) with both formulas was good, without statistical differences (7.60% vs 7.14%: ns). A high monounsaturated fatty acid diabetes-specific supplement improved HbA1c and nutritional status. These improvements were higher with three supplements than with two per day.
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng
Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
Klein, Max; Sharma, Rati; Bohrer, Chris H; Avelis, Cameron M; Roberts, Elijah
2017-01-15
Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. Source code is licensed under the Apache 2.0 open source license and is available at the project website: https://www.assembla.com/spaces/roberts-lab-public/wiki/Biospark CONTACT: eroberts@jhu.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Structure and function of SemiSWEET and SWEET sugar transporters.
Feng, Liang; Frommer, Wolf B
2015-08-01
SemiSWEETs and SWEETs have emerged as unique sugar transporters. First discovered in plants with the help of fluorescent biosensors, homologs exist in all kingdoms of life. Bacterial and plant homologs transport hexoses and sucrose, whereas animal SWEETs transport glucose. Prokaryotic SemiSWEETs are small and comprise a parallel homodimer of an approximately 100 amino acid-long triple helix bundle (THB). Duplicated THBs are fused to create eukaryotic SWEETs in a parallel orientation via an inversion linker helix, producing a similar configuration to that of SemiSWEET dimers. Structures of four SemiSWEETs have been resolved in three states: open outside, occluded, and open inside, indicating alternating access. As we discuss here, these atomic structures provide a basis for exploring the evolution of structure-function relations in this new class of transporters. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hybrid Optimization Parallel Search PACKage
DOE Office of Scientific and Technical Information (OSTI.GOV)
2009-11-10
HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, amore » useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less
NASA Astrophysics Data System (ADS)
Marzeion, B.; Maussion, F.
2017-12-01
Mountain glaciers are one of the few remaining sub-systems of the global climate system for which no globally applicable, open source, community-driven model exists. Notable examples from the ice sheet community include the Parallel Ice Sheet Model or Elmer/Ice. While the atmospheric modeling community has a long tradition of sharing models (e.g. the Weather Research and Forecasting model) or comparing them (e.g. the Coupled Model Intercomparison Project or CMIP), recent initiatives originating from the glaciological community show a new willingness to better coordinate global research efforts following the CMIP example (e.g. the Glacier Model Intercomparison Project or the Glacier Ice Thickness Estimation Working Group). In the recent past, great advances have been made in the global availability of data and methods relevant for glacier modeling, spanning glacier outlines, automatized glacier centerline identification, bed rock inversion methods, and global topographic data sets. Taken together, these advances now allow the ice dynamics of glaciers to be modeled on a global scale, provided that adequate modeling platforms are available. Here, we present the Open Global Glacier Model (OGGM), developed to provide a global scale, modular, and open source numerical model framework for consistently simulating past and future global scale glacier change. Global not only in the sense of leading to meaningful results for all glaciers combined, but also for any small ensemble of glaciers, e.g. at the headwater catchment scale. Modular to allow combinations of different approaches to the representation of ice flow and surface mass balance, enabling a new kind of model intercomparison. Open source so that the code can be read and used by anyone and so that new modules can be added and discussed by the community, following the principles of open governance. Consistent in order to provide uncertainty measures at all realizable scales.
Full-f version of GENE for turbulence in open-field-line systems
NASA Astrophysics Data System (ADS)
Pan, Q.; Told, D.; Shi, E. L.; Hammett, G. W.; Jenko, F.
2018-06-01
Unique properties of plasmas in the tokamak edge, such as large amplitude fluctuations and plasma-wall interactions in the open-field-line regions, require major modifications of existing gyrokinetic codes originally designed for simulating core turbulence. To this end, the global version of the 3D2V gyrokinetic code GENE, so far employing a δf-splitting technique, is extended to simulate electrostatic turbulence in straight open-field-line systems. The major extensions are the inclusion of the velocity-space nonlinearity, the development of a conducting-sheath boundary, and the implementation of the Lenard-Bernstein collision operator. With these developments, the code can be run as a full-f code and can handle particle loss to and reflection from the wall. The extended code is applied to modeling turbulence in the Large Plasma Device (LAPD), with a reduced mass ratio and a much lower collisionality. Similar to turbulence in a tokamak scrape-off layer, LAPD turbulence involves collisions, parallel streaming, cross-field turbulent transport with steep profiles, and particle loss at the parallel boundary.
Stem thrust prediction model for W-K-M double wedge parallel expanding gate valves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eldiwany, B.; Alvarez, P.D.; Wolfe, K.
1996-12-01
An analytical model for determining the required valve stem thrust during opening and closing strokes of W-K-M parallel expanding gate valves was developed as part of the EPRI Motor-Operated Valve Performance Prediction Methodology (EPRI MOV PPM) Program. The model was validated against measured stem thrust data obtained from in-situ testing of three W-K-M valves. Model predictions show favorable, bounding agreement with the measured data for valves with Stellite 6 hardfacing on the disks and seat rings for water flow in the preferred flow direction (gate downstream). The maximum required thrust to open and to close the valve (excluding wedging andmore » unwedging forces) occurs at a slightly open position and not at the fully closed position. In the nonpreferred flow direction, the model shows that premature wedging can occur during {Delta}P closure strokes even when the coefficients of friction at different sliding surfaces are within the typical range. This paper summarizes the model description and comparison against test data.« less
Digital vs. conventional full-arch implant impressions: a comparative study.
Amin, Sarah; Weber, Hans Peter; Finkelman, Matthew; El Rafie, Khaled; Kudara, Yukio; Papaspyridakos, Panos
2017-11-01
To test whether or not digital full-arch implant impressions with two different intra-oral scanners (CEREC Omnicam and True Definition) have the same accuracy as conventional ones. The hypothesis was that the splinted open-tray impressions would be more accurate than digital full-arch impressions. A stone master cast representing an edentulous mandible using five internal connection implant analogs (Straumann Bone Level RC, Basel, Switzerland) was fabricated. The three median implants were parallel to each other, the far left implant had 10°, and the far right had 15° distal angulation. A splinted open-tray technique was used for the conventional polyether impressions (n = 10) for Group 1. Digital impressions (n = 10) were taken with two intra-oral optical scanners (CEREC Omnicam and 3M True Definition) after connecting polymer scan bodies to the master cast for groups 2 and 3. Master cast and conventional impression test casts were digitized with a high-resolution reference scanner (Activity 880 scanner; Smart Optics, Bochum, Germany) to obtain digital files. Standard tessellation language (STL) datasets from the three test groups of digital and conventional impressions were superimposed with the STL dataset from the master cast to assess the 3D deviations. Deviations were recorded as root-mean-square error. To compare the master cast with conventional and digital impressions at the implant level, Welch's F-test was used together with Games-Howell post hoc test. Group I had a mean value of 167.93 μm (SD 50.37); Group II (Omnicam) had a mean value of 46.41 μm (SD 7.34); Group III (True Definition) had a mean value of 19.32 μm (SD 2.77). Welch's F-test was used together with the Games-Howell test for post hoc comparisons. Welch's F-test showed a significant difference between the groups (P < 0.001). The Games-Howell test showed statistically significant 3D deviations for all three groups (P < 0.001). Full-arch digital implant impressions using True Definition scanner and Omnicam were significantly more accurate than the conventional impressions with the splinted open-tray technique. Additionally, the digital impressions with the True Definition scanner had significantly less 3D deviations when compared with the Omnicam. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Parallelization of elliptic solver for solving 1D Boussinesq model
NASA Astrophysics Data System (ADS)
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Parent-Child Parallel-Group Intervention for Childhood Aggression in Hong Kong
ERIC Educational Resources Information Center
Fung, Annis L. C.; Tsang, Sandra H. K. M.
2006-01-01
This article reports the original evidence-based outcome study on parent-child parallel group-designed Anger Coping Training (ACT) program for children aged 8-10 with reactive aggression and their parents in Hong Kong. This research program involved experimental and control groups with pre- and post-comparison. Quantitative data collection…
NASA Astrophysics Data System (ADS)
Grzeszczuk, A.; Kowalski, S.
2015-04-01
Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; ...
2015-07-15
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
2007-05-27
5:00PM - Opening Reception 6:30PM WEDNESDAY, MAY 23, 2007 (General Session) TUESDAY, MAY 22, 2007 (Early Registration) 11:50 AM - LUNCHEON 1:00 PM... RECEPTION 7:00 PM WEDNESDAY, MAY 23, 2007 (Parallel Sessions) Session IVA - OPEN SESSION Chair: Mr. Telly Manolatos, Electronics Development Corporation...company’s logo in the agenda handouts and proceedings o Signage outside the particular event sponsored o Sponsor ribbon on badges • Reception - $8,000
Black lesbian gender and sexual culture: celebration and resistance.
Wilson, Bianca D M
2009-04-01
Lesbian gender expression is a persistent theme in research and writing about lesbian culture. Yet little empirical research has examined the ways lesbian gender functions within the sexual culture of lesbian communities, particularly among lesbians of colour. This study was aimed at documenting and assessing the functions of lesbian gender among African American lesbians. Particular attention was paid to identifying core characteristics of sexual discourses, such as evidence of dominant and resistant sexual scripts and contradictions between messages about sex. This study took the form of a rapid ethnography of an African American lesbian community in the USA using focus groups, individual community leader interviews and participant observations at a weekly open mic event. Findings document how lesbian gender roles translated into distinct sexual roles and expectations that appear to both parallel and radically reject heterosexual norms for sex. The deep roots of the social pressure to date within these roles were also evident within observations at the open microphone events. While data highlighted the central role that lesbian gender roles play in this community, analyses also revealed a strong resistance to the dominance of this sexual cultural system.
Clayson, Peter E; Miller, Gregory A
2017-01-01
Generalizability theory (G theory) provides a flexible, multifaceted approach to estimating score reliability. G theory's approach to estimating score reliability has important advantages over classical test theory that are relevant for research using event-related brain potentials (ERPs). For example, G theory does not require parallel forms (i.e., equal means, variances, and covariances), can handle unbalanced designs, and provides a single reliability estimate for designs with multiple sources of error. This monograph provides a detailed description of the conceptual framework of G theory using examples relevant to ERP researchers, presents the algorithms needed to estimate ERP score reliability, and provides a detailed walkthrough of newly-developed software, the ERP Reliability Analysis (ERA) Toolbox, that calculates score reliability using G theory. The ERA Toolbox is open-source, Matlab software that uses G theory to estimate the contribution of the number of trials retained for averaging, group, and/or event types on ERP score reliability. The toolbox facilitates the rigorous evaluation of psychometric properties of ERP scores recommended elsewhere in this special issue. Copyright © 2016 Elsevier B.V. All rights reserved.
The generalized accessibility and spectral gap of lower hybrid waves in tokamaks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Takahashi, Hironori
1994-03-01
The generalized accessibility of lower hybrid waves, primarily in the current drive regime of tokamak plasmas, which may include shifting, either upward or downward, of the parallel refractive index (n{sub {parallel}}), is investigated, based upon a cold plasma dispersion relation and various geometrical constraint (G.C.) relations imposed on the behavior of n{sub {parallel}}. It is shown that n{sub {parallel}} upshifting can be bounded and insufficient to bridge a large spectral gap to cause wave damping, depending upon whether the G.C. relation allows the oblique resonance to occur. The traditional n{sub {parallel}} upshifting mechanism caused by the pitch angle of magneticmore » field lines is shown to lead to contradictions with experimental observations. An upshifting mechanism brought about by the density gradient along field lines is proposed, which is not inconsistent with experimental observations, and provides plausible explanations to some unresolved issues of lower hybrid wave theory, including generation of {open_quote}seed electrons.{close_quote}« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
2014-05-01
fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic...purpose computing CPU for algorithms where processing of large blocks of data is done in parallel. The reason for that is the GPU’s highly effective...parallel structure. Most of the image and video processing computations involve heavy matrix and vector op- erations over large amounts of data and
Microfabricated linear Paul-Straubel ion trap
Mangan, Michael A [Albuquerque, NM; Blain, Matthew G [Albuquerque, NM; Tigges, Chris P [Albuquerque, NM; Linker, Kevin L [Albuquerque, NM
2011-04-19
An array of microfabricated linear Paul-Straubel ion traps can be used for mass spectrometric applications. Each ion trap comprises two parallel inner RF electrodes and two parallel outer DC control electrodes symmetric about a central trap axis and suspended over an opening in a substrate. Neighboring ion traps in the array can share a common outer DC control electrode. The ions confined transversely by an RF quadrupole electric field potential well on the ion trap axis. The array can trap a wide array of ions.
Performance comparison analysis library communication cluster system using merge sort
NASA Astrophysics Data System (ADS)
Wulandari, D. A. R.; Ramadhan, M. E.
2018-04-01
Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
a Framework for AN Open Source Geospatial Certification Model
NASA Astrophysics Data System (ADS)
Khan, T. U. R.; Davis, P.; Behr, F.-J.
2016-06-01
The geospatial industry is forecasted to have an enormous growth in the forthcoming years and an extended need for well-educated workforce. Hence ongoing education and training play an important role in the professional life. Parallel, in the geospatial and IT arena as well in the political discussion and legislation Open Source solutions, open data proliferation, and the use of open standards have an increasing significance. Based on the Memorandum of Understanding between International Cartographic Association, OSGeo Foundation, and ISPRS this development led to the implementation of the ICA-OSGeo-Lab imitative with its mission "Making geospatial education and opportunities accessible to all". Discussions in this initiative and the growth and maturity of geospatial Open Source software initiated the idea to develop a framework for a worldwide applicable Open Source certification approach. Generic and geospatial certification approaches are already offered by numerous organisations, i.e., GIS Certification Institute, GeoAcademy, ASPRS, and software vendors, i. e., Esri, Oracle, and RedHat. They focus different fields of expertise and have different levels and ways of examination which are offered for a wide range of fees. The development of the certification framework presented here is based on the analysis of diverse bodies of knowledge concepts, i.e., NCGIA Core Curriculum, URISA Body Of Knowledge, USGIF Essential Body Of Knowledge, the "Geographic Information: Need to Know", currently under development, and the Geospatial Technology Competency Model (GTCM). The latter provides a US American oriented list of the knowledge, skills, and abilities required of workers in the geospatial technology industry and influenced essentially the framework of certification. In addition to the theoretical analysis of existing resources the geospatial community was integrated twofold. An online survey about the relevance of Open Source was performed and evaluated with 105 respondents worldwide. 15 interviews (face-to-face or by telephone) with experts in different countries provided additional insights into Open Source usage and certification. The findings led to the development of a certification framework of three main categories with in total eleven sub-categories, i.e., "Certified Open Source Geospatial Data Associate / Professional", "Certified Open Source Geospatial Analyst Remote Sensing & GIS", "Certified Open Source Geospatial Cartographer", "Certified Open Source Geospatial Expert", "Certified Open Source Geospatial Associate Developer / Professional Developer", "Certified Open Source Geospatial Architect". Each certification is described by pre-conditions, scope and objectives, course content, recommended software packages, target group, expected benefits, and the methods of examination. Examinations can be flanked by proofs of professional career paths and achievements which need a peer qualification evaluation. After a couple of years a recertification is required. The concept seeks the accreditation by the OSGeo Foundation (and other bodies) and international support by a group of geospatial scientific institutions to achieve wide and international acceptance for this Open Source geospatial certification model. A business case for Open Source certification and a corresponding SWOT model is examined to support the goals of the Geo-For-All initiative of the ICA-OSGeo pact.
Code of Federal Regulations, 2010 CFR
2010-10-01
....137 Cargo ports. (a) Unless otherwise authorized by the Commandant, the lower edge of any opening for... is drawn parallel to the freeboard deck at side and has as its lowest point the upper edge of the...
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
SiGN-SSM: open source parallel software for estimating gene networks with state space models.
Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru
2011-04-15
SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.
NASA Astrophysics Data System (ADS)
Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.
2017-07-01
Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.
Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.
2011-01-01
We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.
PREMER: a Tool to Infer Biological Networks.
Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R
2017-10-04
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
ZettaBricks: A Language Compiler and Runtime System for Anyscale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amarasinghe, Saman
This grant supported the ZettaBricks and OpenTuner projects. ZettaBricks is a new implicitly parallel language and compiler where defining multiple implementations of multiple algorithms to solve a problem is the natural way of programming. ZettaBricks makes algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The ZettaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, ZettaBricks introduces novel techniques to autotune algorithms for differentmore » convergence criteria. When choosing between various direct and iterative methods, the ZettaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice. OpenTuner is a generalization of the experience gained in building an autotuner for ZettaBricks. OpenTuner is a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests.« less
Parati, Gianfranco; Giglio, Alessia; Lonati, Laura; Destro, Maurizio; Ricci, Alessandra Rossi; Cagnoni, Francesca; Pini, Claudio; Venco, Achille; Maresca, Andrea Maria; Monza, Michela; Grandi, Anna Maria; Omboni, Stefano
2010-07-01
Increasing the dose or adding a second antihypertensive agent are 2 possible therapeutic choices when blood pressure (BP) is poorly controlled with monotherapy. This study investigated the effectiveness and tolerability of barnidipine 10 or 20 mg added to losartan 50 mg versus losartan 100 mg alone in patients with mild to moderate essential hypertension whose BP was uncontrolled by losartan 50-mg monotherapy. This was a 12-week, multicenter, randomized, open-label, parallel-group study. Eligible patients (aged 30-74 years) had uncontrolled hypertension, defined as office sitting diastolic BP (DBP) > or =90 mm Hg and/or systolic BP (SBP) > or =140 mm Hg, and mean daytime DBP > or =85 mm Hg and/or SBP > or =135 mm Hg. All were being treated with losartan 50 mg at enrollment. After a 1-week run-in period while taking losartan 50 mg, patients were randomly assigned to 6 weeks of treatment with open-label barnidipine 10 mg plus losartan 50 mg or losartan 100-mg monotherapy. At the end of this period, patients with uncontrolled BP had barnidipine doubled to 20 mg and continued for an additional 6 weeks, whereas patients not achieving control on treatment with losartan 100 mg were discontinued. Office BP was measured at each visit, whereas 24-hour ambulatory BP monitoring (ABPM) was performed at randomization and at the final visit (ie, after 12 weeks of treatment, or at 6 weeks for patients not controlled on losartan 100 mg). The intent-to-treat population included all randomized patients who received at least one dose of study treatment and had valid ABPM recordings at baseline and the final visit. The primary end point was the change in daytime DBP between baseline and 12 weeks of treatment, compared between the combination treatment and monotherapy. Adverse events (AEs) were evaluated during each study visit. A total of 93 patients were enrolled (age range, 30-75 years; 60% [56/93] men). After the 1-week run-in period, 68 patients were randomly assigned to 6 weeks of treatment with open-label barnidipine 10 mg plus losartan 50 mg (n = 34) or losartan 100-mg monotherapy (n = 34). A total of 53 patients were evaluable (barnidipine plus losartan, n = 28; losartan, n = 25). After 6 weeks of treatment, 18 patients in the combination treatment group (64.3%) had their dose of barnidipine doubled from 10 to 20 mg because BP was not normalized by treatment, whereas 8 patients in the losartan group (32.0%) were discontinued for the same reason. The between-treatment difference (losartan alone - combination treatment) for changes from baseline in daytime DBP was -1.7 mm Hg (95% CI, -5.8 to 2.4 mm Hg; P = NS). A similar result was observed for daytime SBP (-3.2 mm Hg; 95% CI, -8.1 to 1.7 mm Hg; P = NS). Likewise, no significant differences were found for nighttime values (mean [95% CI] DBP, 0.5 mm Hg [-3.7 to 4.7 mm Hg]; SBP, 1.5 mm Hg [-4.1 to 7.1 mm Hg]) or 24-hour values (DBP, -0.9 mm Hg [-4.8 to 2.9 mm Hg]; SBP, -1.6 mm Hg [-5.9 to 2.7 mm Hg]). Combination treatment was associated with a significantly higher rate of SBP responder patients (ie, <140 mm Hg or a reduction of > or =20 mm Hg) compared with monotherapy (82.1% [23/28] vs 56.0% [14/25]; P = 0.044). Drug-related AEs were reported in 4 patients taking combination treatment (total of 7 AEs, including 2 cases of peripheral edema and 1 each of tachycardia, atrial flutter, tinnitus, confusion, and polyuria) and in 2 patients taking losartan alone (total of 2 AEs, both tachycardia). This open-label, parallel-group study found that there was no significant difference in the BP-lowering effect of barnidipine 10 or 20 mg in combination with losartan 50 mg compared with losartan 100-mg monotherapy in these patients with essential hypertension previously uncontrolled by losartan 50-mg monotherapy. However, the percentage of responders for SBP was significantly higher with the combination. Both treatments were generally well tolerated. European Union Drug Regulating Authorities Clinical Trials (EudraCT) no. 2006-001469-41. 2010 Excerpta Medica Inc. All rights reserved.
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease
Shamonin, Denis P.; Bron, Esther E.; Lelieveldt, Boudewijn P. F.; Smits, Marion; Klein, Stefan; Staring, Marius
2013-01-01
Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4–5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15–60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license. PMID:24474917
van der Sluis, Pieter C; Ruurda, Jelle P; van der Horst, Sylvia; Verhage, Roy J J; Besselink, Marc G H; Prins, Margriet J D; Haverkamp, Leonie; Schippers, Carlo; Rinkes, Inne H M Borel; Joore, Hans C A; Ten Kate, Fiebo Jw; Koffijberg, Hendrik; Kroese, Christiaan C; van Leeuwen, Maarten S; Lolkema, Martijn P J K; Reerink, Onne; Schipper, Marguerite E I; Steenhagen, Elles; Vleggaar, Frank P; Voest, Emile E; Siersema, Peter D; van Hillegersberg, Richard
2012-11-30
For esophageal cancer patients, radical esophagolymphadenectomy is the cornerstone of multimodality treatment with curative intent. Transthoracic esophagectomy is the preferred surgical approach worldwide allowing for en-bloc resection of the tumor with the surrounding lymph nodes. However, the percentage of cardiopulmonary complications associated with the transthoracic approach is high (50 to 70%).Recent studies have shown that robot-assisted minimally invasive thoraco-laparoscopic esophagectomy (RATE) is at least equivalent to the open transthoracic approach for esophageal cancer in terms of short-term oncological outcomes. RATE was accompanied with reduced blood loss, shorter ICU stay and improved lymph node retrieval compared with open esophagectomy, and the pulmonary complication rate, hospital stay and perioperative mortality were comparable. The objective is to evaluate the efficacy, risks, quality of life and cost-effectiveness of RATE as an alternative to open transthoracic esophagectomy for treatment of esophageal cancer. This is an investigator-initiated and investigator-driven monocenter randomized controlled parallel-group, superiority trial. All adult patients (age ≥ 18 and ≤ 80 years) with histologically proven, surgically resectable (cT1-4a, N0-3, M0) esophageal carcinoma of the intrathoracic esophagus and with European Clinical Oncology Group performance status 0, 1 or 2 will be assessed for eligibility and included after obtaining informed consent. Patients (n = 112) with resectable esophageal cancer are randomized in the outpatient department to either RATE (n = 56) or open three-stage transthoracic esophageal resection (n = 56). The primary outcome of this study is the percentage of overall complications (grade 2 and higher) as stated by the modified Clavien-Dindo classification of surgical complications. This is the first randomized controlled trial designed to compare RATE with open transthoracic esophagectomy as surgical treatment for resectable esophageal cancer. If our hypothesis is proven correct, RATE will result in a lower percentage of postoperative complications, lower blood loss, and shorter hospital stay, but with at least similar oncologic outcomes and better postoperative quality of life compared with open transthoracic esophagectomy. The study started in January 2012. Follow-up will be 5 years. Short-term results will be analyzed and published after discharge of the last randomized patient. Dutch trial register: NTR3291 ClinicalTrial.gov: NCT01544790.
Ukar, Estibalitz; Laubach, Stephen E.; Marrett, Randall
2016-03-09
Here, we evaluate a published model for crystal growth patterns in quartz cement in sandstone fractures by comparing crystal fracture-spanning predictions to quartz c-axis orientation distributions measured by electron backscatter diffraction (EBSD) of spanning quartz deposits. Samples from eight subvertical opening-mode fractures in four sandstone formations, the Jurassic– Cretaceous Nikanassin Formation, northwestern Alberta Foothills (Canada), Cretaceous Mesaverde Group (USA; Cozzette Sandstone Member of the Iles Formation), Piceance Basin, Colorado (USA), and upper Jurassic–lower Cretaceous Cotton Valley Group (Taylor sandstone) and overlying Travis Peak Formation, east Texas, have similar quartzose composition and grain size but contain fractures with different temperature historiesmore » and opening rates based on fluid inclusion assemblages and burial history. Spherical statistical analysis shows that, in agreement with model predictions, bridging crystals have a preferred orientation with c-axis orientations at a high angle to fracture walls. The second form of validation is for spanning potential that depends on the size of cut substrate grains. Using measured cut substrate grain sizes and c-axis orientations of spanning bridges, we calculated the required orientation for the smallest cut grain to span the maximum gap size and the required orientation of the crystal with the least spanning potential to form overgrowths that span across maximum measured gap sizes. We find that within a 10° error all spanning crystals conform to model predictions. Using crystals with the lowest spanning potential based on crystallographic orientation (c-axis parallel to fracture wall) and a temperature range for fracture opening measured from fluid inclusion assemblages, we calculate maximum fracture opening rates that allow crystals to span. These rates are comparable to those derived independently from fracture temperature histories based on burial history and multiple sequential fluid inclusion assemblages. Results support the R. Lander and S. Laubach model, which predicts that for quartz deposited synchronously with fracture opening, spanning potential, or likelihood of quartz deposits that are thick enough to span between fracture walls, depends on temperature history, fracture opening rate, size of opening increments, and size, mineralogy, and crystallographic orientation of substrates in the fracture wall (transected grains). Results suggest that EBSD maps, which can be more rapidly acquired than measurement of tens to hundreds of fluid inclusion assemblages, can provide a useful measure of relative opening rates within populations of quartz-filled fractures formed under sedimentary basin conditions. Such data are useful for evaluating fracture pattern development models.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ukar, Estibalitz; Laubach, Stephen E.; Marrett, Randall
Here, we evaluate a published model for crystal growth patterns in quartz cement in sandstone fractures by comparing crystal fracture-spanning predictions to quartz c-axis orientation distributions measured by electron backscatter diffraction (EBSD) of spanning quartz deposits. Samples from eight subvertical opening-mode fractures in four sandstone formations, the Jurassic– Cretaceous Nikanassin Formation, northwestern Alberta Foothills (Canada), Cretaceous Mesaverde Group (USA; Cozzette Sandstone Member of the Iles Formation), Piceance Basin, Colorado (USA), and upper Jurassic–lower Cretaceous Cotton Valley Group (Taylor sandstone) and overlying Travis Peak Formation, east Texas, have similar quartzose composition and grain size but contain fractures with different temperature historiesmore » and opening rates based on fluid inclusion assemblages and burial history. Spherical statistical analysis shows that, in agreement with model predictions, bridging crystals have a preferred orientation with c-axis orientations at a high angle to fracture walls. The second form of validation is for spanning potential that depends on the size of cut substrate grains. Using measured cut substrate grain sizes and c-axis orientations of spanning bridges, we calculated the required orientation for the smallest cut grain to span the maximum gap size and the required orientation of the crystal with the least spanning potential to form overgrowths that span across maximum measured gap sizes. We find that within a 10° error all spanning crystals conform to model predictions. Using crystals with the lowest spanning potential based on crystallographic orientation (c-axis parallel to fracture wall) and a temperature range for fracture opening measured from fluid inclusion assemblages, we calculate maximum fracture opening rates that allow crystals to span. These rates are comparable to those derived independently from fracture temperature histories based on burial history and multiple sequential fluid inclusion assemblages. Results support the R. Lander and S. Laubach model, which predicts that for quartz deposited synchronously with fracture opening, spanning potential, or likelihood of quartz deposits that are thick enough to span between fracture walls, depends on temperature history, fracture opening rate, size of opening increments, and size, mineralogy, and crystallographic orientation of substrates in the fracture wall (transected grains). Results suggest that EBSD maps, which can be more rapidly acquired than measurement of tens to hundreds of fluid inclusion assemblages, can provide a useful measure of relative opening rates within populations of quartz-filled fractures formed under sedimentary basin conditions. Such data are useful for evaluating fracture pattern development models.« less
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Parallelization of the preconditioned IDR solver for modern multicore computer systems
NASA Astrophysics Data System (ADS)
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
NASA Astrophysics Data System (ADS)
Ramirez, Andres; Rahnemoonfar, Maryam
2017-04-01
A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
King, James E; Weiss, Alexander; Sisco, Melissa M
2008-11-01
Ratings of 202 chimpanzees on 43 personality descriptor adjectives were used to calculate scores on five domains analogous to the human Five-Factor Model and a chimpanzee-specific Dominance domain. Male and female chimpanzees were divided into five age groups ranging from juvenile to old adult. Internal consistencies and interrater reliabilities of factors were stable across age groups and approximately 6.8 year retest reliabilities were high. Age-related declines in Extraversion and Openness and increases in Agreeableness and Conscientiousness paralleled human age differences. The mean change in absolute standardized units for all five factors was virtually identical in humans and chimpanzees after adjustment for different developmental rates. Consistent with their aggressive behavior in the wild, male chimpanzees were rated as more aggressive, emotional, and impulsive than females. Chimpanzee sex differences in personality were greater than comparable human gender differences. These findings suggest that chimpanzee and human personality develop via an unfolding maturational process. (PsycINFO Database Record (c) 2008 APA, all rights reserved).
Chlumský, J; Striz, I; Terl, M; Vondracek, J
2006-01-01
Under Global Initiative for Asthma guidelines, the clinical control of disease activity and the adjustment of treatment in patients with asthma are based on symptoms, use of rescue medication, lung function and peak expiratory flow measurement (standard strategy). We investigated whether a strategy to reduce the number of sputum eosinophils (EOS strategy) gives better clinical control and a lower exacerbation rate compared with the standard strategy. Fifty-five patients with moderate to severe asthma entered this open, randomized, parallel-group study and visited the out-patient department every 3 months for 18 months. The dose of corticosteroids was adjusted according to the standard strategy or the percentage of sputum eosinophils (EOS strategy). During the study period, the EOS strategy led to a significantly lower incidence of asthma exacerbations compared with the standard strategy group (0.22 and 0.78 exacerbations per year per patient, respectively). There were significant differences between the strategies in time to first exacerbation.
Gyrokinetic continuum simulation of turbulence in a straight open-field-line plasma
Shi, E. L.; Hammett, G. W.; Stoltzfus-Dueck, T.; ...
2017-05-29
Here, five-dimensional gyrokinetic continuum simulations of electrostatic plasma turbulence in a straight, open-field-line geometry have been performed using a full- discontinuous-Galerkin approach implemented in the Gkeyll code. While various simplifications have been used for now, such as long-wavelength approximations in the gyrokinetic Poisson equation and the Hamiltonian, these simulations include the basic elements of a fusion-device scrape-off layer: localised sources to model plasma outflow from the core, cross-field turbulent transport, parallel flow along magnetic field lines, and parallel losses at the limiter or divertor with sheath-model boundary conditions. The set of sheath-model boundary conditions used in the model allows currentsmore » to flow through the walls. In addition to details of the numerical approach, results from numerical simulations of turbulence in the Large Plasma Device, a linear device featuring straight magnetic field lines, are presented.« less
LAMMPS strong scaling performance optimization on Blue Gene/Q
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coffman, Paul; Jiang, Wei; Romero, Nichols A.
2014-11-12
LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Open SHMEM Reference Implementation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pritchard, Howard; Curtis, Anthony; Welch, Aaron
2016-05-12
OpenSHMEM is an effort to create a specification for a standardized API for parallel programming in the Partitioned Global Address Space. Along with the specification the project is also creating a reference implementation of the API. This implementation attempts to be portable, to allow it to be deployed in multiple environments, and to be a starting point for implementations targeted to particular hardware platforms. It will also serve as a springboard for future development of the API.
Dickson, Richard K.
2010-09-07
A quick insert and release laser beam guard panel clamping apparatus having a base plate mountable on an optical table, a first jaw affixed to the base plate, and a spring-loaded second jaw slidably carried by the base plate to exert a clamping force. The first and second jaws each having a face acutely angled relative to the other face to form a V-shaped, open channel mouth, which enables wedge-action jaw separation by and subsequent clamping of a laser beam guard panel inserted through the open channel mouth. Preferably, the clamping apparatus also includes a support structure having an open slot aperture which is positioned over and parallel with the open channel mouth.
Deep crustal deformation by sheath folding in the Adirondack Mountains, USA
NASA Technical Reports Server (NTRS)
Mclelland, J. M.
1988-01-01
As described by McLelland and Isachsen, the southern half of the Adirondacks are underlain by major isoclinal (F sub 1) and open-upright (F sub 2) folds whose axes are parallel, trend approximately E-W, and plunge gently about the horizontal. These large structures are themselves folded by open upright folds trending NNE (F sub 3). It is pointed out that elongation lineations in these rocks are parallel to X of the finite strain ellipsoid developed during progressive rotational strain. The parallelism between F sub 1 and F sub 2 fold axes and elongation lineations led to the hypothesis that progressive rotational strain, with a west-directed tectonic transport, rotated earlier F sub 1-folds into parallelism with the evolving elongation lineation. Rotation is accomplished by ductile, passive flow of F sub 1-axes into extremely arcuate, E-W hinges. In order to test these hypotheses a number of large folds were mapped in the eastern Adirondacks. Other evidence supporting the existence of sheath folds in the Adirondacks is the presence, on a map scale, of synforms whose limbs pass through the vertical and into antiforms. This type of outcrop pattern is best explained by intersecting a horizontal plane with the double curvature of sheath folds. It is proposed that sheath folding is a common response of hot, ductile rocks to rotational strain at deep crustal levels. The recognition of sheath folds in the Adirondacks reconciles the E-W orientation of fold axes with an E-W elongation lineation.
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Malmberg Gavelin, Hanna; Eskilsson, Therese; Boraxbekk, Carl-Johan; Josefsson, Maria; Stigsdotter Neely, Anna; Slunga Järvholm, Lisbeth
2018-04-25
Stress-related exhaustion has been associated with selective and enduring cognitive impairments. However, little is known about how to address cognitive deficits in stress rehabilitation and how this influences stress recovery over time. The aim of this open-label, parallel randomized controlled trial (ClinicalTrials.gov: NCT03073772) was to investigate the long-term effects of 12 weeks cognitive or aerobic training on cognitive function, psychological health, and work ability for patients diagnosed with exhaustion disorder (ED). One-hundred-and-thirty-two patients (111 women) participating in multimodal stress rehabilitation were randomized to receive additional cognitive training (n = 44), additional aerobic training (n = 47), or no additional training (n = 41). Treatment effects were assessed before, immediately after and one-year post intervention. The primary outcome was global cognitive function. Secondary outcomes included domain-specific cognition, self-reported burnout, depression, anxiety, fatigue and work ability, aerobic capacity, and sick-leave levels. Intention-to-treat analysis revealed a small but lasting improvement in global cognitive functioning for the cognitive training group, paralleled by a large improvement on a trained updating task. The aerobic training group showed improvements in aerobic capacity and episodic memory immediately after training, but no long-term benefits. General improvements in psychological health and work ability were observed, with no difference between interventional groups. Our findings suggest that cognitive training may be a viable method to address cognitive impairments for patients with ED, whereas the effects of aerobic exercise on cognition may be more limited when performed during a restricted time period. The implications for clinical practice in supporting patients with ED to adhere to treatment are discussed.
Scalar collapse in AdS with an OpenCL open source code
NASA Astrophysics Data System (ADS)
Liebling, Steven L.; Khanna, Gaurav
2017-10-01
We study the spherically symmetric collapse of a scalar field in anti-de Sitter spacetime using a newly constructed, open-source code which parallelizes over heterogeneous architectures using the open standard OpenCL. An open question for this scenario concerns how to tell, a priori, whether some form of initial data will be stable or will instead develop under the turbulent instability into a black hole in the limit of vanishing amplitude. Previous work suggested the existence of islands of stability around quasi-periodic solutions, and we use this new code to examine the stability properties of approximately quasi-periodic solutions which balance energy transfer to higher modes with energy transfer to lower modes. The evolutions provide some evidence, though not conclusively, for stability of initial data sufficiently close to quasiperiodic solutions.
Mulenga, Veronica; Musiime, Victor; Kekitiinwa, Adeodata; Cook, Adrian D; Abongomera, George; Kenny, Julia; Chabala, Chisala; Mirembe, Grace; Asiimwe, Alice; Owen-Powell, Ellen; Burger, David; McIlleron, Helen; Klein, Nigel; Chintu, Chifumbe; Thomason, Margaret J; Kityo, Cissy; Walker, A Sarah; Gibb, Diana M
2016-01-01
Summary Background WHO 2013 guidelines recommend universal treatment for HIV-infected children younger than 5 years. No paediatric trials have compared nucleoside reverse-transcriptase inhibitors (NRTIs) in first-line antiretroviral therapy (ART) in Africa, where most HIV-infected children live. We aimed to compare stavudine, zidovudine, or abacavir as dual or triple fixed-dose-combination paediatric tablets with lamivudine and nevirapine or efavirenz. Methods In this open-label, parallel-group, randomised trial (CHAPAS-3), we enrolled children from one centre in Zambia and three in Uganda who were previously untreated (ART naive) or on stavudine for more than 2 years with viral load less than 50 copies per mL (ART experienced). Computer-generated randomisation tables were incorporated securely within the database. The primary endpoint was grade 2–4 clinical or grade 3/4 laboratory adverse events. Analysis was intention to treat. This trial is registered with the ISRCTN Registry number, 69078957. Findings Between Nov 8, 2010, and Dec 28, 2011, 480 children were randomised: 156 to stavudine, 159 to zidovudine, and 165 to abacavir. After two were excluded due to randomisation error, 156 children were analysed in the stavudine group, 158 in the zidovudine group, and 164 in the abacavir group, and followed for median 2·3 years (5% lost to follow-up). 365 (76%) were ART naive (median age 2·6 years vs 6·2 years in ART experienced). 917 grade 2–4 clinical or grade 3/4 laboratory adverse events (835 clinical [634 grade 2]; 40 laboratory) occurred in 104 (67%) children on stavudine, 103 (65%) on zidovudine, and 105 (64%), on abacavir (p=0·63; zidovudine vs stavudine: hazard ratio [HR] 0·99 [95% CI 0·75–1·29]; abacavir vs stavudine: HR 0·88 [0·67–1·15]). At 48 weeks, 98 (85%), 81 (80%) and 95 (81%) ART-naive children in the stavudine, zidovudine, and abacavir groups, respectively, had viral load less than 400 copies per mL (p=0·58); most ART-experienced children maintained suppression (p=1·00). Interpretation All NRTIs had low toxicity and good clinical, immunological, and virological responses. Clinical and subclinical lipodystrophy was not noted in those younger than 5 years and anaemia was no more frequent with zidovudine than with the other drugs. Absence of hypersensitivity reactions, superior resistance profile and once-daily dosing favours abacavir for African children, supporting WHO 2013 guidelines. Funding European Developing Countries Clinical Trials Partnership. PMID:26481928
[Compliancy of pre-exposure prophylaxis for HIV infection in men who have sex with men in Chengdu].
Xu, J Y; Mou, Y C; Ma, Y L; Zhang, J Y
2017-05-10
Objective: To evaluate the compliancy of HIV pre-exposure prophylaxis (PrEP) in men who have sex with men (MSM) in Chengdu, Sichuan province, and explore the influencing factors. Methods: From 1 July 2013 to 30 September 2015, a random, open, multi-center and parallel control intervention study was conducted in 328 MSM enrolled by non-probability sampling in Chengdu. The MSM were divided into 3 groups randomly, i.e. daily group, intermittent group (before and after exposure) and control group. Clinical follow-up and questionnaire survey were carried out every 3 months. Their PrEP compliances were evaluated respectively and multivariate logistic regression analysis was conducted to identify the related factors. Results: A total of 141 MSM were surveyed, in whom 59(41.8 % ) had good PrEP compliancy. The PrEP compliancy rate was 69.0 % in daily group, higher than that in intermittent group (14.3 % ), the difference had significance ( χ (2)=45.29, P <0.001). Multivariate logistic analysis indicated that type of PrEP was the influencing factors of PrEP compliancy. Compared with daily group, the intermittent group had worse PrEP compliancy ( OR =0.07, 95 %CI : 0.03-0.16). Conclusion: The PrEP compliance of the MSM in this study was poor, the compliancy would be influenced by the type of PrEP.
Pirard, Céline; Loumaye, Ernest; Wyns, Christine
2015-01-01
Background. The aim of this pilot study was to evaluate intranasal buserelin for luteal phase support and compare its efficacy with standard vaginal progesterone in IVF/ICSI antagonist cycles. Methods. This is a prospective, randomized, open, parallel group study. Forty patients underwent ovarian hyperstimulation with human menopausal gonadotropin under pituitary inhibition with gonadotropin-releasing hormone antagonist, while ovulation trigger and luteal support were achieved using intranasal GnRH agonist (group A). Twenty patients had their cycle downregulated with buserelin and stimulated with hMG, while ovulation trigger was achieved using 10,000 IU human chorionic gonadotropin with luteal support by intravaginal progesterone (group B). Results. No difference was observed in estradiol levels. Progesterone levels on day 5 were significantly lower in group A. However, significantly higher levels of luteinizing hormone were observed in group A during the entire luteal phase. Pregnancy rates (31.4% versus 22.2%), implantation rates (22% versus 15.4%), and clinical pregnancy rates (25.7% versus 16.7%) were not statistically different between groups, although a trend towards higher rates was observed in group A. No luteal phase lasting less than 10 days was recorded in either group. Conclusion. Intranasal administration of buserelin is effective for providing luteal phase support in IVF/ICSI antagonist protocols. PMID:25945092
ERIC Educational Resources Information Center
Herrenkohl, Ellen C.
1978-01-01
Group therapy participation and religious conversion have been cited as sources of personal growth by a number of formerly abusive parents. The parallels in the dynamics of change for the two kinds of experiences are discussed in the context of the factors thought to lead to abuse. (Author)
Kołtowski, Łukasz; Aradi, Daniel; Huczek, Zenon; Tomaniak, Mariusz; Sibbing, Dirk; Filipiak, Krzysztof J; Kochman, Janusz; Balsam, Paweł; Opolski, Grzegorz
2016-01-01
High platelet reactivity (HPR) and presence of CYP2C19 loss-of-function alleles are associated with higher risk for periprocedural myocardial infarction in clopidogrel-treated patients undergoing percutaneous coronary intervention (PCI). It is unknown whether personalised treatment based on platelet function testing or genotyping can prevent such complications. The ONSIDE-TEST is a multicentre, prospective, open-label, randomised controlled clinical trial aiming to assess if optimisation of antiplatelet therapy based on either phenotyping or genotyping is superior to conventional care. Patients will be randomised into phenotyping, genotyping, or control arms. In the phenotyping group, patients will be tested with the VerifyNow P2Y12 assay before PCI, and patients with a platelet reactivity unit greater than 208 will be switched over to prasugrel, while others will continue on clopidogrel therapy. In the genotyping group, carriers of the *2 loss-of-function allele will receive prasugrel for PCI, while wild-type subjects will be treated with clopidogrel. Patients in the control arm will be treated with standard-dose clopidogrel. The primary endpoint of the study is the prevalence of periprocedural myocardial injury within 24 h after PCI in the controls as compared to the phenotyping and genotyping group. Secondary endpoints include cardiac death, myocardial infarction, definite or probable stent thrombosis, or urgent repeat revascularisation within 30 days of PCI. Primary safety outcome is Bleeding Academic Research Consortium (BARC) type 3 and 5 bleeding during 30 days of PCI. The ONSIDE TEST trial is expected to verify the clinical utility of an individualised antiplatelet strategy in preventing periprocedural myocardial injury by either phenotyping or genotyping. ClinicalTrials.gov: NCT01930773.
Kim, Soo Jin; Lee, Young-Ki; Oh, Jieun; Cho, AJin; Noh, Jung Woo
2017-09-15
The association between the dialysate calcium level and coronary artery calcification (CAC) has not yet been evaluated in hemodialysis patients. The objective of this study was to determine whether lowering the dialysate calcium levels would decrease the progression of coronary artery calcification (CAC) compared to using standard calcium dialysate. We conducted an open-label randomized trial with parallel groups. The patients were randomly assigned to either 12-month treatment with low calcium dialysate (LCD; 1.25mmol/L, n=36) or standard calcium dialysate (SCD; 1.5mmol/L, n=40). The primary outcome was the change in the CAC scores assessed by 64-slice multidetector computed tomography after 12months. During the treatment period, CAC scores increased in both groups, especially significant in LCD group (402.5±776.8, 580.5±1011.9, P=0.004). When we defined progressors as patients at second and third tertiles of CAC changes, progressor group had a higher proportion of LCD-treated patients than SCD-treated patients (P=0.0229). In multivariate analysis, LCD treatment is a significant risk factor for increase in CAC scores (odds ratio=5.720, 95% CI: 1.219-26.843, P=0.027). Use of LCD may accelerate the progression of CAC in patients with chronic hemodialysis over a 12-month period. Clinical Research Information Service [Internet]; Osong (Chungcheongbuk-do): Korea Centers for Disease Control and Prevention, Ministry of Health and Welfare (Republic of Korea), 2010: KCT0000942. Available from: https://cris.nih.go.kr/cris/search/search_result_st01_kren.jsp?seq=3572&sLeft=2&type=my. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Memantine and constraint-induced aphasia therapy in chronic poststroke aphasia.
Berthier, Marcelo L; Green, Cristina; Lara, J Pablo; Higueras, Carolina; Barbancho, Miguel A; Dávila, Guadalupe; Pulvermüller, Friedemann
2009-05-01
We conducted a randomized, double-blind, placebo-controlled, parallel-group study of both memantine and constraint-induced aphasia therapy (CIAT) on chronic poststroke aphasia followed by an open-label extension phase. Patients were randomized to memantine (20 mg/day) or placebo alone during 16 weeks, followed by combined drug treatment with CIAT (weeks 16-18), drug treatment alone (weeks 18-20), and washout (weeks 20-24), and finally, an open-label extension phase of memantine (weeks 24-48). After baseline evaluations, clinical assessments were done at two end points (weeks 16 and 18), and at weeks 20, 24, and 48. Outcome measures were changes in the Western Aphasia Battery-Aphasia Quotient and the Communicative Activity Log. Twenty-eight patients were included, and 27 completed both treatment phases. The memantine group showed significantly better improvement on Western Aphasia Battery-Aphasia Quotient compared with the placebo group while the drug was taken (week 16, p = 0.002; week 18, p = 0.0001; week 20, p = 0.005) and at the washout assessment (p = 0.041). A significant increase in Communicative Activity Log was found in favor of memantine-CIAT relative to placebo-CIAT (week 18, p = 0.040). CIAT treatment led to significant improvement in both groups (p = 0.001), which was even greater under additional memantine treatment (p = 0.038). Beneficial effects of memantine were maintained in the long-term follow-up evaluation, and patients who switched to memantine from placebo experienced a benefit (p = 0.02). Both memantine and CIAT alone improved aphasia severity, but best outcomes were achieved combining memantine with CIAT. Beneficial effects of memantine and CIAT persisted on long-term follow-up.
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations
NASA Astrophysics Data System (ADS)
Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.
2010-09-01
The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Jong, Wibe de
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Drane, Daniel L.; Loring, David W.; Voets, Natalie L.; Price, Michele; Ojemann, Jeffrey G.; Willie, Jon T.; Saindane, Amit M.; Phatak, Vaishali; Ivanisevic, Mirjana; Millis, Scott; Helmers, Sandra L.; Miller, John W.; Meador, Kimford J.; Gross, Robert E.
2015-01-01
SUMMARY OBJECTIVES Temporal lobe epilepsy (TLE) patients experience significant deficits in category-related object recognition and naming following standard surgical approaches. These deficits may result from a decoupling of core processing modules (e.g., language, visual processing, semantic memory), due to “collateral damage” to temporal regions outside the hippocampus following open surgical approaches. We predicted stereotactic laser amygdalohippocampotomy (SLAH) would minimize such deficits because it preserves white matter pathways and neocortical regions critical for these cognitive processes. METHODS Tests of naming and recognition of common nouns (Boston Naming Test) and famous persons were compared with nonparametric analyses using exact tests between a group of nineteen patients with medically-intractable mesial TLE undergoing SLAH (10 dominant, 9 nondominant), and a comparable series of TLE patients undergoing standard surgical approaches (n=39) using a prospective, non-randomized, non-blinded, parallel group design. RESULTS Performance declines were significantly greater for the dominant TLE patients undergoing open resection versus SLAH for naming famous faces and common nouns (F=24.3, p<.0001, η2=.57, & F=11.2, p<.001, η2=.39, respectively), and for the nondominant TLE patients undergoing open resection versus SLAH for recognizing famous faces (F=3.9, p<.02, η2=.19). When examined on an individual subject basis, no SLAH patients experienced any performance declines on these measures. In contrast, 32 of the 39 undergoing standard surgical approaches declined on one or more measures for both object types (p<.001, Fisher’s exact test). Twenty-one of 22 left (dominant) TLE patients declined on one or both naming tasks after open resection, while 11 of 17 right (non-dominant) TLE patients declined on face recognition. SIGNIFICANCE Preliminary results suggest 1) naming and recognition functions can be spared in TLE patients undergoing SLAH, and 2) the hippocampus does not appear to be an essential component of neural networks underlying name retrieval or recognition of common objects or famous faces. PMID:25489630
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Lo, Jessica W; Bunce, Catey; Charteris, David; Banerjee, Philip; Phillips, Rachel; Cornelius, Victoria R
2016-08-02
Open globe ocular trauma complicated by intraocular scarring (proliferative vitreoretinopathy) is a relatively rare, blinding, but potentially treatable condition for which, at present, surgery is often unsatisfactory and visual results frequently poor. To date, no pharmacological adjuncts to surgery have been proven to be effective. The aim of the Adjunctive Steroid Combination in Ocular Trauma (ASCOT) randomised controlled trial is to determine whether adjunctive steroid (triamcinolone acetonide), given at the time of surgery, can improve the outcome of vitreoretinal surgery in patients with open globe ocular trauma. This article presents the statistical analysis plan for the main publication as approved and signed off by the Trial Steering Committee prior to the first data extraction for the Data Monitoring Committee meeting report. ASCOT is a pragmatic, multi-centre, parallel-group, double-masked randomised controlled trial. The aim of the study is to recruit from 20-25 centres in the United Kingdom and randomise 300 eyes (from 300 patients) into two treatment arms. Both groups will receive standard surgical treatment and care; the intervention arm will additionally receive a pre-operative steroid combination (triamcinolone acetonide) into the vitreous cavity consisting of 4 mg/0.1 ml and 40 mg/1 ml sub-Tenon's. Participants will be followed for 6 months post-surgery. The primary outcome is the proportion of patients achieving a clinically meaning improvement in visual acuity in the study eye at 6 months after initial surgery, defined as a 10 letter score improvement in the ETDRS (the standard scale to test visual acuity). ISRCTN30012492 . Registered on 5 September 2014. EudraCT2014-002193-37 . Registered on 5 September 2014.
Sustaining Open Source Communities through Hackathons - An Example from the ASPECT Community
NASA Astrophysics Data System (ADS)
Heister, T.; Hwang, L.; Bangerth, W.; Kellogg, L. H.
2016-12-01
The ecosystem surrounding a successful scientific open source software package combines both social and technical aspects. Much thought has been given to the technology side of writing sustainable software for large infrastructure projects and software libraries, but less about building the human capacity to perpetuate scientific software used in computational modeling. One effective format for building capacity is regular multi-day hackathons. Scientific hackathons bring together a group of science domain users and scientific software contributors to make progress on a specific software package. Innovation comes through the chance to work with established and new collaborations. Especially in the domain sciences with small communities, hackathons give geographically distributed scientists an opportunity to connect face-to-face. They foster lively discussions amongst scientists with different expertise, promote new collaborations, and increase transparency in both the technical and scientific aspects of code development. ASPECT is an open source, parallel, extensible finite element code to simulate thermal convection, that began development in 2011 under the Computational Infrastructure for Geodynamics. ASPECT hackathons for the past 3 years have grown the number of authors to >50, training new code maintainers in the process. Hackathons begin with leaders establishing project-specific conventions for development, demonstrating the workflow for code contributions, and reviewing relevant technical skills. Each hackathon expands the developer community. Over 20 scientists add >6,000 lines of code during the >1 week event. Participants grow comfortable contributing to the repository and over half continue to contribute afterwards. A high return rate of participants ensures continuity and stability of the group as well as mentoring for novice members. We hope to build other software communities on this model, but anticipate each to bring their own unique challenges.
Scott, Jonathan L; Moxham, Bernard J; Rutherford, Stephen M
2014-01-01
Teaching and learning in anatomy is undertaken by a variety of methodologies, yet all of these pedagogies benefit from students discussing and reflecting upon their learning activities. An approach of particular potency is peer-mediated learning, through either peer-teaching or collaborative peer-learning. Collaborative, peer-mediated, learning activities help promote deep learning approaches and foster communities of practice in learning. Students generally flourish in collaborative learning settings but there are limitations to the benefits of collaborative learning undertaken solely within the confines of modular curricula. We describe the development of peer-mediated learning through student-focused and student-led study groups we have termed ‘Shadow Modules’. The ‘Shadow Module’ takes place parallel to the formal academically taught module and facilitates collaboration between students to support their learning for that module. In ‘Shadow Module’ activities, students collaborate towards curating existing online open resources as well as developing learning resources of their own to support their study. Through the use of communication technologies and web 2.0 tools these resources are able to be shared with their peers, thus enhancing the learning experience of all students following the module. The Shadow Module activities have the potential to lead to participants feeling a greater sense of engagement with the subject material, as well as improving their study and group-working skills and developing digital literacy. The outputs from Shadow Module collaborative work are open-source and may be utilised by subsequent student cohorts, thus building up a repository of learning resources designed by and for students. Shadow Module activities would benefit all pedagogies in the study of anatomy, and support students moving from being passive consumers to active participants in learning. PMID:24117249
Scott, Jonathan L; Moxham, Bernard J; Rutherford, Stephen M
2014-03-01
Teaching and learning in anatomy is undertaken by a variety of methodologies, yet all of these pedagogies benefit from students discussing and reflecting upon their learning activities. An approach of particular potency is peer-mediated learning, through either peer-teaching or collaborative peer-learning. Collaborative, peer-mediated, learning activities help promote deep learning approaches and foster communities of practice in learning. Students generally flourish in collaborative learning settings but there are limitations to the benefits of collaborative learning undertaken solely within the confines of modular curricula. We describe the development of peer-mediated learning through student-focused and student-led study groups we have termed 'Shadow Modules'. The 'Shadow Module' takes place parallel to the formal academically taught module and facilitates collaboration between students to support their learning for that module. In 'Shadow Module' activities, students collaborate towards curating existing online open resources as well as developing learning resources of their own to support their study. Through the use of communication technologies and Web 2.0 tools these resources are able to be shared with their peers, thus enhancing the learning experience of all students following the module. The Shadow Module activities have the potential to lead to participants feeling a greater sense of engagement with the subject material, as well as improving their study and group-working skills and developing digital literacy. The outputs from Shadow Module collaborative work are open-source and may be utilised by subsequent student cohorts, thus building up a repository of learning resources designed by and for students. Shadow Module activities would benefit all pedagogies in the study of anatomy, and support students moving from being passive consumers to active participants in learning. © 2013 Anatomical Society.
Stocks, Jennifer Dugan; Taneja, Baldeo K; Baroldi, Paolo; Findling, Robert L
2012-04-01
To evaluate safety and tolerability of four doses of immediate-release molindone hydrochloride in children with attention-deficit/hyperactivity disorder (ADHD) and serious conduct problems. This open-label, parallel-group, dose-ranging, multicenter trial randomized children, aged 6-12 years, with ADHD and persistent, serious conduct problems to receive oral molindone thrice daily for 9-12 weeks in four treatment groups: Group 1-10 mg (5 mg if weight <30 kg), group 2-20 mg (10 mg if <30 kg), group 3-30 mg (15 mg if <30 kg), and group 4-40 mg (20 mg if <30 kg). The primary outcome measure was to evaluate safety and tolerability of molindone in children with ADHD and serious conduct problems. Secondary outcome measures included change in Nisonger Child Behavior Rating Form-Typical Intelligence Quotient (NCBRF-TIQ) Conduct Problem subscale scores, change in Clinical Global Impressions-Severity (CGI-S) and -Improvement (CGI-I) subscale scores from baseline to end point, and Swanson, Nolan, and Pelham rating scale-revised (SNAP-IV) ADHD-related subscale scores. The study randomized 78 children; 55 completed the study. Treatment with molindone was generally well tolerated, with no clinically meaningful changes in laboratory or physical examination findings. The most common treatment-related adverse events (AEs) included somnolence (n=9), weight increase (n=8), akathisia (n=4), sedation (n=4), and abdominal pain (n=4). Mean weight increased by 0.54 kg, and mean body mass index by 0.24 kg/m(2). The incidence of AEs and treatment-related AEs increased with increasing dose. NCBRF-TIQ subscale scores improved in all four treatment groups, with 34%, 34%, 32%, and 55% decreases from baseline in groups 1, 2, 3, and 4, respectively. CGI-S and SNAP-IV scores improved over time in all treatment groups, and CGI-I scores improved to the greatest degree in group 4. Molindone at doses of 5-20 mg/day (children weighing <30 kg) and 20-40 mg (≥ 30 kg) was well tolerated, and preliminary efficacy results suggest that molindone produces dose-related behavioral improvements over 9-12 weeks. Additional double-blind, placebo-controlled trials are needed to further investigate molindone in this pediatric population.
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
NASA Technical Reports Server (NTRS)
Fijany, Amir
1993-01-01
In this paper, parallel O(log n) algorithms for computation of rigid multibody dynamics are developed. These parallel algorithms are derived by parallelization of new O(n) algorithms for the problem. The underlying feature of these O(n) algorithms is a drastically different strategy for decomposition of interbody force which leads to a new factorization of the mass matrix (M). Specifically, it is shown that a factorization of the inverse of the mass matrix in the form of the Schur Complement is derived as M(exp -1) = C - B(exp *)A(exp -1)B, wherein matrices C, A, and B are block tridiagonal matrices. The new O(n) algorithm is then derived as a recursive implementation of this factorization of M(exp -1). For the closed-chain systems, similar factorizations and O(n) algorithms for computation of Operational Space Mass Matrix lambda and its inverse lambda(exp -1) are also derived. It is shown that these O(n) algorithms are strictly parallel, that is, they are less efficient than other algorithms for serial computation of the problem. But, to our knowledge, they are the only known algorithms that can be parallelized and that lead to both time- and processor-optimal parallel algorithms for the problem, i.e., parallel O(log n) algorithms with O(n) processors. The developed parallel algorithms, in addition to their theoretical significance, are also practical from an implementation point of view due to their simple architectural requirements.
van der Kop, Mia Liisa; Muhula, Samuel; Nagide, Patrick I; Thabane, Lehana; Gelmon, Lawrence; Awiti, Patricia Opondo; Abunah, Bonface; Kyomuhangi, Lennie Bazira; Budd, Matthew A; Marra, Carlo; Patel, Anik; Karanja, Sarah; Ojakaa, David I; Mills, Edward J; Ekström, Anna Mia; Lester, Richard Todd
2018-03-01
Retention of patients in HIV care is crucial to ensure timely treatment initiation, viral suppression, and to avert AIDS-related deaths. We did a randomised trial to determine whether a text-messaging intervention improved retention during the first year of HIV care. This unmasked, randomised parallel-group study was done at two clinics in informal settlements in Nairobi, Kenya. Eligible participants were aged 18 years or older, HIV-positive, had their own mobile phone or access to one, and were able to use simple text messaging (or have somebody who could text message on their behalf). Participants were randomly assigned (1:1), with random block sizes of 2, 4, and 6, to the intervention or control group. Participants in the intervention group received a weekly text message from the automated WelTel service for 1 year and were asked to respond within 48 h. Participants in the control group did not receive text messages. Participants in both groups received usual care, which comprised psychosocial support and counselling; patient education; CD4 cell count; treatment; screening for tuberculosis, opportunistic infections, and sexually transmitted infections; prevention of mother-to-child transmission and family planning services; and up to two telephone calls for missed appointments. The primary outcome was retention in care at 12 months (ie, clinic attendance 10-14 months after the first visit). Participants who did not attend this 12-month appointment were traced, and we considered as retained those who were confirmed to be active in care elsewhere. The data analyst and clinic staff were masked to the group assignment, whereas participants and research nurses were not. We analysed the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01630304. Between April 4, 2013, and June 4, 2015, we screened 1068 individuals, of whom 700 were recruited. 349 people were allocated to the intervention group and 351 to the control group. Participants were followed up for a median of 55 weeks (IQR 51-60). At 12 months, 277 (79%) of 349 participants in the intervention group were retained, compared with 285 (81%) of 351 participants in the control group (risk ratio 0·98, 95% CI 0·91-1·05; p=0·54). There was one mild adverse event related to the intervention, a domestic dispute that occurred when a participant's partner became suspicious of the weekly messages and follow-up calls. This weekly text-messaging service did not improve retention of people in early HIV care. The intervention might have a modest role in improving self-perceived health-related quality of life in individuals in HIV care in similar settings. National Institutes of Health and Canadian Institutes of Health Research Canadian HIV Trials Network. Copyright © 2018 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND 4.0 license. Published by Elsevier Ltd.. All rights reserved.
Social Features of Online Networks: The Strength of Intermediary Ties in Online Social Media
Grabowicz, Przemyslaw A.; Ramasco, José J.; Moro, Esteban; Pujol, Josep M.; Eguiluz, Victor M.
2012-01-01
An increasing fraction of today's social interactions occur using online social media as communication channels. Recent worldwide events, such as social movements in Spain or revolts in the Middle East, highlight their capacity to boost people's coordination. Online networks display in general a rich internal structure where users can choose among different types and intensity of interactions. Despite this, there are still open questions regarding the social value of online interactions. For example, the existence of users with millions of online friends sheds doubts on the relevance of these relations. In this work, we focus on Twitter, one of the most popular online social networks, and find that the network formed by the basic type of connections is organized in groups. The activity of the users conforms to the landscape determined by such groups. Furthermore, Twitter's distinction between different types of interactions allows us to establish a parallelism between online and offline social networks: personal interactions are more likely to occur on internal links to the groups (the weakness of strong ties); events transmitting new information go preferentially through links connecting different groups (the strength of weak ties) or even more through links connecting to users belonging to several groups that act as brokers (the strength of intermediary ties). PMID:22247773
Sharma, Parichit; Mantri, Shrikant S
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
Sharma, Parichit; Mantri, Shrikant S.
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation
NASA Astrophysics Data System (ADS)
Kussainov, A. S.
2017-12-01
The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.
Substructured multibody molecular dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grest, Gary Stephen; Stevens, Mark Jackson; Plimpton, Steven James
2006-11-01
We have enhanced our parallel molecular dynamics (MD) simulation software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator, lammps.sandia.gov) to include many new features for accelerated simulation including articulated rigid body dynamics via coupling to the Rensselaer Polytechnic Institute code POEMS (Parallelizable Open-source Efficient Multibody Software). We use new features of the LAMMPS software package to investigate rhodopsin photoisomerization, and water model surface tension and capillary waves at the vapor-liquid interface. Finally, we motivate the recipes of MD for practitioners and researchers in numerical analysis and computational mechanics.
An Investigation of the Posterior Component of Occlusal Force
1994-05-01
of the hemostat allowed subjects to consistently orient the bite force transducer parallel to the occIusal plane , thus allowing the bite force to be...the anterior component of occlusal force was influenced by the steepness of the occlusal plane . Southard et al (1989) was the first to quantify the...young adult males yielded higher mean maximum bite forces at 20 mm opening and at 40 mm opening. The authors suggested that orientation and function of
Griffin, Damian; Parsons, Nick; Shaw, Ewart; Kulikov, Yuri; Hutchinson, Charles; Thorogood, Margaret; Lamb, Sarah E
2014-07-24
To investigate whether surgery by open reduction and internal fixation provides benefit compared with non-operative treatment for displaced, intra-articular calcaneal fractures. Pragmatic, multicentre, two arm, parallel group, assessor blinded randomised controlled trial (UK Heel Fracture Trial). 22 tertiary referral hospitals, United Kingdom. 151 patients with acute displaced intra-articular calcaneal fractures randomly allocated to operative (n=73) or non-operative (n=78) treatment. The primary outcome measure was patient reported Kerr-Atkins score for pain and function (scale 0-100, 100 being the best possible score) at two years after injury. Secondary outcomes were complications; hindfoot pain and function (American Orthopaedic Foot and Ankle Society score); general health (SF-36); quality of life (EQ-5D); clinical examination; walking speed; and gait symmetry. Analysis was by intention to treat. 95% follow-up was achieved for the primary outcome (69 in operative group and 74 in non-operative group), and a complete set of secondary outcomes were available for 75% of participants. There was no significant difference in the primary outcome (mean Kerr-Atkins score 69.8 in operative group v 65.7 in non-operative group; adjusted 95% confidence interval of difference -7.1 to 7.0) or in any of the secondary outcomes between treatment groups. Complications and reoperations were more common in those who received operative care (estimated odds ratio 7.5, 95% confidence interval 2.0 to 41.8). Operative treatment compared with non-operative care showed no symptomatic or functional advantage after two years in patients with typical displaced intra-articular fractures of the calcaneus, and the risk of complications was higher after surgery. Based on these findings, operative treatment by open reduction and internal fixation is not recommended for these fractures.Trial registration Current Controlled Trials ISRCTN37188541. © Griffin et al 2014.
Dural opening/removal for combined petrosal approach: technical note.
Terasaka, Shunsuke; Asaoka, Katsuyuki; Kobayashi, Hiroyuki; Sugiyama, Taku; Yamaguchi, Shigeru
2011-03-01
Detailed descriptions of stepwise dural opening/removal for combined petrosal approach are presented. Following maximum bone work, the first dural incision was made along the undersurface of the temporal lobe parallel to the superior petrosal sinus. Posterior extension of the dural incision was made in a curved fashion, keeping away from the transverse-sigmoid junction and taking care to preserve the vein of Labbé. A second incision was made perpendicular to the first incision. After sectioning the superior petrosal sinus around the porus trigeminus, the incision was extended toward the posterior fossa dura in the middle fossa region. The tentorium was incised toward the incisura at a point just posterior to the entrance of the trochlear nerve. A third incision was made longitudinally between the superior petrosal sinus and the jugular bulb. A final incision was initiated perpendicular to the third incision in the presigmoid region and extended parallel to the superior petrosal sinus connecting the second incision. The dural complex consisting of the temporal lobe dura, the posterior fossa dura, and the freed tentorium could then be removed. In addition to extensive bone resection, our strategic cranial base dural opening/removal can yield true advantages for the combined petrosal approach.
ParallABEL: an R library for generalized parallelization of genome-wide association studies.
Sangket, Unitsa; Mahasirimongkol, Surakameth; Chantratita, Wasun; Tandayya, Pichaya; Aulchenko, Yurii S
2010-04-29
Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.
Neurocognitive sparing of desktop microbeam irradiation.
Bazyar, Soha; Inscoe, Christina R; Benefield, Thad; Zhang, Lei; Lu, Jianping; Zhou, Otto; Lee, Yueh Z
2017-08-11
Normal tissue toxicity is the dose-limiting side effect of radiotherapy. Spatial fractionation irradiation techniques, like microbeam radiotherapy (MRT), have shown promising results in sparing the normal brain tissue. Most MRT studies have been conducted at synchrotron facilities. With the aim to make this promising treatment more available, we have built the first desktop image-guided MRT device based on carbon nanotube x-ray technology. In the current study, our purpose was to evaluate the effects of MRT on the rodent normal brain tissue using our device and compare it with the effect of the integrated equivalent homogenous dose. Twenty-four, 8-week-old male C57BL/6 J mice were randomly assigned to three groups: MRT, broad-beam (BB) and sham. The hippocampal region was irradiated with two parallel microbeams in the MRT group (beam width = 300 μm, center-to-center = 900 μm, 160 kVp). The BB group received the equivalent integral dose in the same area of their brain. Rotarod, marble burying and open-field activity tests were done pre- and every month post-irradiation up until 8 months to evaluate the cognitive changes and potential irradiation side effects on normal brain tissue. The open-field activity test was substituted by Barnes maze test at 8th month. A multilevel model, random coefficients approach was used to evaluate the longitudinal and temporal differences among treatment groups. We found significant differences between BB group as compared to the microbeam-treated and sham mice in the number of buried marble and duration of the locomotion around the open-field arena than shams. Barnes maze revealed that BB mice had a lower capacity for spatial learning than MRT and shams. Mice in the BB group tend to gain weight at the slower pace than shams. No meaningful differences were found between MRT and sham up until 8-month follow-up using our measurements. Applying MRT with our newly developed prototype compact CNT-based image-guided MRT system utilizing the current irradiation protocol can better preserve the integrity of normal brain tissue. Consequently, it enables applying higher irradiation dose that promises better tumor control. Further studies are required to evaluate the full extent effects of this novel modality.
Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory
2014-01-01
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Highly efficient spatial data filtering in parallel using the opensource library CPPPO
NASA Astrophysics Data System (ADS)
Municchi, Federico; Goniva, Christoph; Radl, Stefan
2016-10-01
CPPPO is a compilation of parallel data processing routines developed with the aim to create a library for "scale bridging" (i.e. connecting different scales by mean of closure models) in a multi-scale approach. CPPPO features a number of parallel filtering algorithms designed for use with structured and unstructured Eulerian meshes, as well as Lagrangian data sets. In addition, data can be processed on the fly, allowing the collection of relevant statistics without saving individual snapshots of the simulation state. Our library is provided with an interface to the widely-used CFD solver OpenFOAM®, and can be easily connected to any other software package via interface modules. Also, we introduce a novel, extremely efficient approach to parallel data filtering, and show that our algorithms scale super-linearly on multi-core clusters. Furthermore, we provide a guideline for choosing the optimal Eulerian cell selection algorithm depending on the number of CPU cores used. Finally, we demonstrate the accuracy and the parallel scalability of CPPPO in a showcase focusing on heat and mass transfer from a dense bed of particles.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
A Debugger for Computational Grid Applications
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation gives an overview of a debugger for computational grid applications. Details are given on NAS parallel tools groups (including parallelization support tools, evaluation of various parallelization strategies, and distributed and aggregated computing), debugger dependencies, scalability, initial implementation, the process grid, and information on Globus.
Lee, Sang Ki; Kim, Kap Jung; Park, Kyung Hoon; Choy, Won Sik
2014-10-01
With the continuing improvements in implants for distal humerus fractures, it is expected that newer types of plates, which are anatomically precontoured, thinner and less irritating to soft tissue, would have comparable outcomes when used in a clinical study. The purpose of this study was to compare the clinical and radiographic outcomes in patients with distal humerus fractures who were treated with orthogonal and parallel plating methods using precontoured distal humerus plates. Sixty-seven patients with a mean age of 55.4 years (range 22-90 years) were included in this prospective study. The subjects were randomly assigned to receive 1 of 2 treatments: orthogonal or parallel plating. The following results were assessed: operating time, time to fracture union, presence of a step or gap at the articular margin, varus-valgus angulation, functional recovery, and complications. No intergroup differences were observed based on radiological and clinical results between the groups. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes, mean operation time, union time, or complication rates. There were no cases of fracture nonunion in either group; heterotrophic ossification was found 3 patients in orthogonal plating group and 2 patients in parallel plating group. In our practice, no significant differences were found between the orthogonal and parallel plating methods in terms of clinical outcomes or complication rates. However, orthogonal plating method may be preferred in cases of coronal shear fractures, where posterior to anterior fixation may provide additional stability to the intraarticular fractures. Additionally, parallel plating method may be the preferred technique used for fractures that occur at the most distal end of the humerus.
Innerhofer, Petra; Fries, Dietmar; Mittermayr, Markus; Innerhofer, Nicole; von Langen, Daniel; Hell, Tobias; Gruber, Gottfried; Schmid, Stefan; Friesenecker, Barbara; Lorenz, Ingo H; Ströhle, Mathias; Rastner, Verena; Trübsbach, Susanne; Raab, Helmut; Treml, Benedikt; Wally, Dieter; Treichl, Benjamin; Mayr, Agnes; Kranewitter, Christof; Oswald, Elgar
2017-06-01
Effective treatment of trauma-induced coagulopathy is important; however, the optimal therapy is still not known. We aimed to compare the efficacy of first-line therapy using fresh frozen plasma (FFP) or coagulation factor concentrates (CFC) for the reversal of trauma-induced coagulopathy, the arising transfusion requirements, and consequently the development of multiple organ failure. This single-centre, parallel-group, open-label, randomised trial was done at the Level 1 Trauma Center in Innsbruck Medical University Hospital (Innsbruck, Austria). Patients with trauma aged 18-80 years, with an Injury Severity Score (ISS) greater than 15, bleeding signs, and plasmatic coagulopathy identified by abnormal fibrin polymerisation or prolonged coagulation time using rotational thromboelastometry (ROTEM) were eligible. Patients with injuries that were judged incompatible with survival, cardiopulmonary resuscitation on the scene, isolated brain injury, burn injury, avalanche injury, or prehospital coagulation therapy other than tranexamic acid were excluded. We used a computer-generated randomisation list, stratification for brain injury and ISS, and closed opaque envelopes to randomly allocate patients to treatment with FFP (15 mL/kg of bodyweight) or CFC (primarily fibrinogen concentrate [50 mg/kg of bodyweight]). Bleeding management began immediately after randomisation and continued until 24 h after admission to the intensive care unit. The primary clinical endpoint was multiple organ failure in the modified intention-to-treat population (excluding patients who discontinued treatment). Reversal of coagulopathy and need for massive transfusions were important secondary efficacy endpoints that were the reason for deciding the continuation or termination of the trial. This trial is registered with ClinicalTrials.gov, number NCT01545635. Between March 3, 2012, and Feb 20, 2016, 100 out of 292 screened patients were included and randomly allocated to FFP (n=48) and CFC (n=52). Six patients (four in the FFP group and two in the CFC group) discontinued treatment because of overlooked exclusion criteria or a major protocol deviation with loss of follow-up. 44 patients in the FFP group and 50 patients in the CFC group were included in the final interim analysis. The study was terminated early for futility and safety reasons because of the high proportion of patients in the FFP group who required rescue therapy compared with those in the CFC group (23 [52%] in the FFP group vs two [4%] in the CFC group; odds ratio [OR] 25·34 [95% CI 5·47-240·03], p<0·0001) and increased needed for massive transfusion (13 [30%] in the FFP group vs six [12%] in the CFC group; OR 3·04 [0·95-10·87], p=0·042) in the FFP group. Multiple organ failure occurred in 29 (66%) patients in the FFP group and in 25 (50%) patients in the CFC group (OR 1·92 [95% CI 0·78-4·86], p=0·15). Our results underline the importance of early and effective fibrinogen supplementation for severe clotting failure in multiple trauma. The available sample size in our study appears sufficient to make some conclusions that first-line CFC is superior to FFP. None. Copyright © 2017 Elsevier Ltd. All rights reserved.
Data Race Benchmark Collection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua
2017-03-21
This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
Seino, Yutaka; Yabe, Daisuke; Takami, Akane; Niemoeller, Elisabeth; Takagi, Hiroki
2015-01-01
This 76-week, open-label, parallel-group study assessed the long-term safety of once-daily lixisenatide monotherapy in Japanese patients with type 2 diabetes mellitus. Patients were randomized to receive lixisenatide in a 2-step or a 1-step dose-increase regimen. The primary objective was to assess the safety of lixisenatide at week 24 by a descriptive comparison of the 2- and 1-step groups. As expected with treatment with a glucagon-like peptide-1 agonist, nausea was the most common treatment-emergent adverse event (2-step group: n=12/33 [36.4%] vs 1-step group: n=18/36 [50.0%] up to week 24). In total, 5/33 patients (15.2%; 2-step group) and 2/36 patients (5.6%; 1-step group) prematurely discontinued treatment up to week 24, mainly due to adverse events. Serious treatment-emergent adverse events occurred in 2/33 patients (6.1%; 2-step group) versus 0/36 patients (0%; 1-step group) up to week 24. Symptomatic hypoglycemia occurred in 2/33 patients (6.1%; 2-step group) versus 1/36 patients (2.8%; 1-step group) up to week 24, with no severe events reported. Glycated hemoglobin, fasting plasma glucose, and body weight were reduced from baseline at weeks 24 and 76. In Japanese patients with type 2 diabetes mellitus, once-daily lixisenatide monotherapy was well tolerated, with less nausea with the 2-step regimen. Copyright © 2015. Published by Elsevier Inc.
Pernar, Luise I M; Ashley, Stanley W; Smink, Douglas S; Zinner, Michael J; Peyre, Sarah E
2012-01-01
Practicing within the Halstedian model of surgical education, academic surgeons serve dual roles as physicians to their patients and educators of their trainees. Despite this significant responsibility, few surgeons receive formal training in educational theory to inform their practice. The goal of this work was to gain an understanding of how master surgeons approach teaching uncommon and highly complex operations and to determine the educational constructs that frame their teaching philosophies and approaches. Individuals included in the study were queried using electronically distributed open-ended, structured surveys. Responses to the surveys were analyzed and grouped using grounded theory and were examined for parallels to concepts of learning theory. Academic teaching hospital. Twenty-two individuals identified as master surgeons. Twenty-one (95.5%) individuals responded to the survey. Two primary thematic clusters were identified: global approach to teaching (90.5% of respondents) and approach to intraoperative teaching (76.2%). Many of the emergent themes paralleled principles of transfer learning theory outlined in the psychology and education literature. Key elements included: conferring graduated responsibility (57.1%), encouraging development of a mental set (47.6%), fostering or expecting deliberate practice (42.9%), deconstructing complex tasks (38.1%), vertical transfer of information (33.3%), and identifying general principles to structure knowledge (9.5%). Master surgeons employ many of the principles of learning theory when teaching uncommon and highly complex operations. The findings may hold significant implications for faculty development in surgical education. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele
2018-03-01
Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
Engineered plant biomass feedstock particles
Dooley, James H [Federal Way, WA; Lanning, David N [Federal Way, WA; Broderick, Thomas F [Lake Forest Park, WA
2011-10-18
A novel class of flowable biomass feedstock particles with unusually large surface areas that can be manufactured in remarkably uniform sizes using low-energy comminution techniques. The feedstock particles are roughly parallelepiped in shape and characterized by a length dimension (L) aligned substantially with the grain direction and defining a substantially uniform distance along the grain, a width dimension (W) normal to L and aligned cross grain, and a height dimension (H) normal to W and L. The particles exhibit a disrupted grain structure with prominent end and surface checks that greatly enhances their skeletal surface area as compared to their envelope surface area. The L.times.H dimensions define a pair of substantially parallel side surfaces characterized by substantially intact longitudinally arrayed fibers. The W.times.H dimensions define a pair of substantially parallel end surfaces characterized by crosscut fibers and end checking between fibers. The L.times.W dimensions define a pair of substantially parallel top surfaces characterized by some surface checking between longitudinally arrayed fibers. At least 80% of the particles pass through a 1/4 inch screen having a 6.3 mm nominal sieve opening but are retained by a No. 10 screen having a 2 mm nominal sieve opening. The feedstock particles are manufactured from a variety of plant biomass materials including wood, crop residues, plantation grasses, hemp, bagasse, and bamboo.
Real science at the petascale.
Saksena, Radhika S; Boghosian, Bruce; Fazendeiro, Luis; Kenway, Owain A; Manos, Steven; Mazzeo, Marco D; Sadiq, S Kashif; Suter, James L; Wright, David; Coveney, Peter V
2009-06-28
We describe computational science research that uses petascale resources to achieve scientific results at unprecedented scales and resolution. The applications span a wide range of domains, from investigation of fundamental problems in turbulence through computational materials science research to biomedical applications at the forefront of HIV/AIDS research and cerebrovascular haemodynamics. This work was mainly performed on the US TeraGrid 'petascale' resource, Ranger, at Texas Advanced Computing Center, in the first half of 2008 when it was the largest computing system in the world available for open scientific research. We have sought to use this petascale supercomputer optimally across application domains and scales, exploiting the excellent parallel scaling performance found on up to at least 32 768 cores for certain of our codes in the so-called 'capability computing' category as well as high-throughput intermediate-scale jobs for ensemble simulations in the 32-512 core range. Furthermore, this activity provides evidence that conventional parallel programming with MPI should be successful at the petascale in the short to medium term. We also report on the parallel performance of some of our codes on up to 65 636 cores on the IBM Blue Gene/P system at the Argonne Leadership Computing Facility, which has recently been named the fastest supercomputer in the world for open science.
Porting plasma physics simulation codes to modern computing architectures using the
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Abbott, Stephen
2015-11-01
Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source
Cardoso, José; Oliveira, Filipe F; Proenca, Mariana P; Ventura, João
2018-05-22
With the consistent shrinking of devices, micro-systems are, nowadays, widely used in areas such as biomedics, electronics, automobiles, and measurement devices. As devices shrunk, so too did their energy consumptions, opening the way for the use of nanogenerators (NGs) as power sources. In particular, to harvest energy from an object's motion (mechanical vibrations, torsional forces, or pressure), present NGs are mainly composed of piezoelectric materials in which, upon an applied compressive or strain force, an electrical field is produced that can be used to power a device. The focus of this work is to simulate the piezoelectric effect in different ZnO nanostructures to optimize the output potential generated by a nanodevice. In these simulations, cylindrical nanowires, nanomushrooms, and nanotrees were created, and the influence of the nanostructures' shape on the output potential was studied as a function of applied parallel and perpendicular forces. The obtained results demonstrated that the output potential is linearly proportional to the applied force and that perpendicular forces are more efficient in all structures. However, nanotrees were found to have an increased sensitivity to parallel applied forces, which resulted in a large enhancement of the output efficiency. These results could then open a new path to increase the efficiency of piezoelectric nanogenerators.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
The Transition to a Many-core World
NASA Astrophysics Data System (ADS)
Mattson, T. G.
2012-12-01
The need to increase performance within a fixed energy budget has pushed the computer industry to many core processors. This is grounded in the physics of computing and is not a trend that will just go away. It is hard to overestimate the profound impact of many-core processors on software developers. Virtually every facet of the software development process will need to change to adapt to these new processors. In this talk, we will look at many-core hardware and consider its evolution from a perspective grounded in the CPU. We will show that the number of cores will inevitably increase, but in addition, a quest to maximize performance per watt will push these cores to be heterogeneous. We will show that the inevitable result of these changes is a computing landscape where the distinction between the CPU and the GPU is blurred. We will then consider the much more pressing problem of software in a many core world. Writing software for heterogeneous many core processors is well beyond the ability of current programmers. One solution is to support a software development process where programmer teams are split into two distinct groups: a large group of domain-expert productivity programmers and much smaller team of computer-scientist efficiency programmers. The productivity programmers work in terms of high level frameworks to express the concurrency in their problems while avoiding any details for how that concurrency is exploited. The second group, the efficiency programmers, map applications expressed in terms of these frameworks onto the target many-core system. In other words, we can solve the many-core software problem by creating a software infrastructure that only requires a small subset of programmers to become master parallel programmers. This is different from the discredited dream of automatic parallelism. Note that productivity programmers still need to define the architecture of their software in a way that exposes the concurrency inherent in their problem. We submit that domain-expert programmers understand "what is concurrent". The parallel programming problem emerges from the complexity of "how that concurrency is utilized" on real hardware. The research described in this talk was carried out in collaboration with the ParLab at UC Berkeley. We use a design pattern language to define the high level frameworks exposed to domain-expert, productivity programmers. We then use tools from the SEJITS project (Selective embedded Just In time Specializers) to build the software transformation tool chains thst turn these framework-oriented designs into highly efficient code. The final ingredient is a software platform to serve as a target for these tools. One such platform is the OpenCL industry standard for programming heterogeneous systems. We will briefly describe OpenCL and show how it provides a vendor-neutral software target for current and future many core systems; both CPU-based, GPU-based, and heterogeneous combinations of the two.
Parallel Logic Programming and Parallel Systems Software and Hardware
1989-07-29
Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
NASA Technical Reports Server (NTRS)
Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry
1998-01-01
Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.
Santer, Miriam; Rumsby, Kate; Ridd, Matthew J; Francis, Nick A; Stuart, Beth; Chorozoglou, Maria; Wood, Wendy; Roberts, Amanda; Thomas, Kim S; Williams, Hywel C; Little, Paul
2015-11-01
Bath emollients are widely prescribed for childhood eczema, yet evidence of their benefits over direct application of emollients is lacking. Objectives To determine the clinical and cost-effectiveness of adding bath emollient to the standard management of eczema in children Pragmatic open 2-armed parallel group randomised controlled trial. General practitioner (GP) practices in England and Wales. Children aged over 12 months and less than 12 years with eczema, excluding inactive or very mild eczema (5 or less on Nottingham Eczema Severity Scale). Children will be randomised to either bath emollients plus standard eczema care or standard eczema care only. Primary outcome is long-term eczema severity, measured by the Patient-Oriented Eczema Measure (POEM) repeated weekly for 16 weeks. Secondary outcomes include: number of eczema exacerbations resulting in healthcare consultations over 1 year; eczema severity over 1 year; disease-specific and generic quality of life; medication use and healthcare resource use; cost-effectiveness. Aiming to detect a mean difference between groups of 2.0 (SD 7.0) in weekly POEM scores over 16 weeks (significance 0.05, power 0.9), allowing for 20% loss to follow-up, gives a total sample size of 423 children. We will use repeated measures analysis of covariance, or a mixed model, to analyse weekly POEM scores. We will control for possible confounders, including baseline eczema severity and child's age. Cost-effectiveness analysis will be carried out from a National Health Service (NHS) perspective. This protocol was approved by Newcastle and North Tyneside 1 NRES committee 14/NE/0098. Follow-up will be completed in 2017. Findings will be disseminated to participants and carers, the public, dermatology and primary care journals, guideline developers and decision-makers. ISRCTN84102309. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Veleba, Jiri; Matoulek, Martin; Hill, Martin; Pelikanova, Terezie; Kahleova, Hana
2016-01-01
It has been shown that it is possible to modify macronutrient oxidation, physical fitness and resting energy expenditure (REE) by changes in diet composition. Furthermore, mitochondrial oxidation can be significantly increased by a diet with a low glycemic index. The purpose of our trial was to compare the effects of a vegetarian (V) and conventional diet (C) with the same caloric restriction (−500 kcal/day) on physical fitness and REE after 12 weeks of diet plus aerobic exercise in 74 patients with type 2 diabetes (T2D). An open, parallel, randomized study design was used. All meals were provided for the whole study duration. An individualized exercise program was prescribed to the participants and was conducted under supervision. Physical fitness was measured by spiroergometry and indirect calorimetry was performed at the start and after 12 weeks Repeated-measures ANOVA (Analysis of variance) models with between-subject (group) and within-subject (time) factors and interactions were used for evaluation of the relationships between continuous variables and factors. Maximal oxygen consumption (VO2max) increased by 12% in vegetarian group (V) (F = 13.1, p < 0.001, partial η2 = 0.171), whereas no significant change was observed in C (F = 0.7, p = 0.667; group × time F = 9.3, p = 0.004, partial η2 = 0.209). Maximal performance (Watt max) increased by 21% in V (F = 8.3, p < 0.001, partial η2 = 0.192), whereas it did not change in C (F = 1.0, p = 0.334; group × time F = 4.2, p = 0.048, partial η2 = 0.116). Our results indicate that V leads more effectively to improvement in physical fitness than C after aerobic exercise program. PMID:27792174
Gertsik, Lev; Favreau, Joya T.; Smith, Shawnee I.; Mirocha, James M.; Rao, Uma; Daar, Eric S.
2013-01-01
Abstract Objectives The study objectives were to determine whether massage therapy reduces symptoms of depression in subjects with human immunodeficiency virus (HIV) disease. Design Subjects were randomized non-blinded into one of three parallel groups to receive Swedish massage or to one of two control groups, touch or no intervention for eight weeks. Settings/location The study was conducted at the Department of Psychiatry and Behavioral Neurosciences at Cedars-Sinai Medical Center in Los Angeles, California, which provided primary clinical care in an institutional setting. Subjects Study inclusion required being at least 16 years of age, HIV-seropositive, with a diagnosis of major depressive disorder. Subjects had to be on a stable neuropsychiatric, analgesic, and antiretroviral regimen for >30 days with no plans to modify therapy for the duration of the study. Approximately 40% of the subjects were currently taking antidepressants. All subjects were medically stable. Fifty-four (54) subjects were randomized, 50 completed at least 1 week (intent-to-treat; ITT), and 37 completed the study (completers). Interventions Swedish massage and touch subjects visited the massage therapist for 1 hour twice per week. The touch group had a massage therapist place both hands on the subject with slight pressure, but no massage, in a uniform distribution in the same pattern used for the massage subjects. Outcome measures The primary outcome measure was the Hamilton Rating Scale for Depression score, with the secondary outcome measure being the Beck Depression Inventory. Results For both the ITT and completers analyses, massage significantly reduced the severity of depression beginning at week 4 (p≤0.04) and continuing at weeks 6 (p≤0.03) and 8 (p≤0.005) compared to no intervention and/or touch. Conclusions The results indicate that massage therapy can reduce symptoms of depression in subjects with HIV disease. The durability of the response, optimal “dose” of massage, and mechanisms by which massage exerts its antidepressant effects remain to be determined. PMID:23098696
High-quality eddy-covariance CO2 budgets under cold climate conditions
NASA Astrophysics Data System (ADS)
Kittler, Fanny; Eugster, Werner; Foken, Thomas; Heimann, Martin; Kolle, Olaf; Göckede, Mathias
2017-08-01
This study aimed at quantifying potential negative effects of instrument heating to improve eddy-covariance flux data quality in cold environments. Our overarching objective was to minimize heating-related bias in annual CO2 budgets from an Arctic permafrost system. We used continuous eddy-covariance measurements covering three full years within an Arctic permafrost ecosystem with parallel sonic anemometers operation with activated heating and without heating as well as parallel operation of open- and closed-path gas analyzers, the latter serving as a reference. Our results demonstrate that the sonic anemometer heating has a direct effect on temperature measurements while the turbulent wind field is not affected. As a consequence, fluxes of sensible heat are increased by an average 5 W m-2 with activated heating, while no direct effect on other scalar fluxes was observed. However, the biased measurements in sensible heat fluxes can have an indirect effect on the CO2 fluxes in case they are used as input for a density-flux WPL correction of an open-path gas analyzer. Evaluating the self-heating effect of the open-path gas analyzer by comparing CO2 flux measurements between open- and closed-path gas analyzers, we found systematically higher CO2 uptake recorded with the open-path sensor, leading to a cumulative annual offset of 96 gC m-2, which was not only the result of the cold winter season but also due to substantial self-heating effects during summer. With an inclined sensor mounting, only a fraction of the self-heating correction for vertically mounted instruments is required.
Geospatial Applications on Different Parallel and Distributed Systems in enviroGRIDS Project
NASA Astrophysics Data System (ADS)
Rodila, D.; Bacu, V.; Gorgan, D.
2012-04-01
The execution of Earth Science applications and services on parallel and distributed systems has become a necessity especially due to the large amounts of Geospatial data these applications require and the large geographical areas they cover. The parallelization of these applications comes to solve important performance issues and can spread from task parallelism to data parallelism as well. Parallel and distributed architectures such as Grid, Cloud, Multicore, etc. seem to offer the necessary functionalities to solve important problems in the Earth Science domain: storing, distribution, management, processing and security of Geospatial data, execution of complex processing through task and data parallelism, etc. A main goal of the FP7-funded project enviroGRIDS (Black Sea Catchment Observation and Assessment System supporting Sustainable Development) [1] is the development of a Spatial Data Infrastructure targeting this catchment region but also the development of standardized and specialized tools for storing, analyzing, processing and visualizing the Geospatial data concerning this area. For achieving these objectives, the enviroGRIDS deals with the execution of different Earth Science applications, such as hydrological models, Geospatial Web services standardized by the Open Geospatial Consortium (OGC) and others, on parallel and distributed architecture to maximize the obtained performance. This presentation analysis the integration and execution of Geospatial applications on different parallel and distributed architectures and the possibility of choosing among these architectures based on application characteristics and user requirements through a specialized component. Versions of the proposed platform have been used in enviroGRIDS project on different use cases such as: the execution of Geospatial Web services both on Web and Grid infrastructures [2] and the execution of SWAT hydrological models both on Grid and Multicore architectures [3]. The current focus is to integrate in the proposed platform the Cloud infrastructure, which is still a paradigm with critical problems to be solved despite the great efforts and investments. Cloud computing comes as a new way of delivering resources while using a large set of old as well as new technologies and tools for providing the necessary functionalities. The main challenges in the Cloud computing, most of them identified also in the Open Cloud Manifesto 2009, address resource management and monitoring, data and application interoperability and portability, security, scalability, software licensing, etc. We propose a platform able to execute different Geospatial applications on different parallel and distributed architectures such as Grid, Cloud, Multicore, etc. with the possibility of choosing among these architectures based on application characteristics and complexity, user requirements, necessary performances, cost support, etc. The execution redirection on a selected architecture is realized through a specialized component and has the purpose of offering a flexible way in achieving the best performances considering the existing restrictions.
Wiegratz, Inka; Elliesen, Jörg; Paoletti, Anna Maria; Walzer, Anja; Kirsch, Bodo
2015-01-01
Objective To evaluate the effect of a digital dispenser’s acoustic alarm function on adherence to ethinylestradiol (EE) 20 μg/drospirenone 3 mg in a flexible extended regimen (EE/drospirenoneFlex) among women in five European countries (France, Germany, Italy, Spain, UK) seeking oral contraception. Study design Randomized, parallel-group open-label study. Methods Women aged 18–35 years received EE/drospirenoneFlex administered in a regimen with cycle lengths of their choice with the aid of a digital pill dispenser over 1 year. In group A (N=250), the dispenser’s acoustic alarm was activated (ie, acoustic alarm + visual reminder). In group B (N=249), the acoustic alarm was deactivated (ie, visual reminder only). In addition, the women recorded pill intake daily in diary cards. The primary efficacy variable was the mean delay of daily pill release after the dispenser reminded the woman to take a pill (reference time). Secondary efficacy variables included number of missed pills, contraceptive efficacy, bleeding pattern, tolerability, and user satisfaction. Results Dispenser data showed a mean (standard deviation [SD]) daily delay in pill release of 88 (126) minutes in group A vs 178 (140) minutes in group B (P<0.0001). Median (lower quartile, Q1; upper quartile, Q3) number of missed pills was 0 (0; 1) in group A vs 4 (1; 9) in group B (P<0.0001). Diary card results revealed similar trends; however, underreporting of missed pills was evident in both groups. No pregnancies were reported during 424 women-years of exposure. Across the two groups, the mean (SD) EE/drospirenoneFlex cycle length was 51.0 (31.8) days with strong regional differences, and the mean (SD) number of bleeding/spotting days was 50.4 (33.0) days. EE/drospirenoneFlex was well tolerated, and 80% of women were satisfied with treatment. Conclusion The dispenser’s activated acoustic alarm improved adherence with daily tablet intake of EE/drospirenoneFlex, reducing missed pills. EE/drospirenoneFlex provided effective contraception and a good tolerability profile. PMID:25609999
Arslan, Zakir; Çalık, Eyup Serhat; Kaplan, Bekir; Ahiskalioglu, Elif Oral
2016-01-01
There are many studies conducted on reducing the frequency and severity of fentayl-induced cough during anesthesia induction. We propose that pheniramine maleate, an antihistaminic, may suppress this cough. We aim to observe the effect of pheniramine on fentanyl-induced cough during anesthesia induction. This is a double-blinded, prospective, three-arm parallel, randomized clinical trial of 120 patients with ASA (American Society of Anesthesiologists) physical status III and IV who aged ≥18 and scheduled for elective open heart surgery during general anesthesia. Patients were randomly assigned to three groups of 40 patients, using computer-generated random numbers: placebo group, pheniramine group, and lidocaine group. Cough incidence differed significantly between groups. In the placebo group, 37.5% of patients had cough, whereas the frequency was significantly decreased in pheniramine group (5%) and lidocaine group (15%) (Fischer exact test, p=0.0007 and p=0.0188, respectively). There was no significant change in cough incidence between pheniramine group (5%) and lidocaine group (15%) (Fischer exact test, p=0.4325). Cough severity did also change between groups. Post Hoc tests with Bonferroni showed that mean cough severity in placebo differed significantly than that of pheniramine group and lidocaine group (p<0.0001 and p=0.009, respectively). There was no significant change in cough severity between pheniramine group and lidocaine group (p=0.856). Intravenous pheniramine is as effective as lidocaine in preventing fentayl-induced cough. Our results emphasize that pheniramine is a convenient drug to decrease this cough. Copyright © 2015 Sociedade Brasileira de Anestesiologia. Published by Elsevier Editora Ltda. All rights reserved.
Branson: A Mini-App for Studying Parallel IMC, Version 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Long, Alex
This code solves the gray thermal radiative transfer (TRT) equations in parallel using simple opacities and Cartesian meshes. Although Branson solves the TRT equations it is not designed to model radiation transport: Branson contains simple physics and does not have a multigroup treatment, nor can it use physical material data. The opacities have are simple polynomials in temperature there is a limited ability to specify complex geometries and sources. Branson was designed only to capture the computational demands of production IMC codes, especially in large parallel runs. It was also intended to foster collaboration with vendors, universities and other DOEmore » partners. Branson is similar in character to the neutron transport proxy-app Quicksilver from LLNL, which was recently open-sourced.« less
Remotely replaceable tokamak plasma limiter tiles
Tsuo, Simon , Langford, Alison A.
1989-01-01
U-shaped limiter tiles placed end-to-end over a pair of parallel runners secured to a wall have two rods which engage L-shaped slots in the runners. The short receiving legs of the L-shaped slots are perpendicular to the wall and open away from the wall, while long retaining legs are parallel to and adjacent the wall. A sliding bar between the runners has grooves with clips to retain the rods pressed into receiving legs of the L-shaped slots in the runners. Sliding the bar in the direction of retaining legs of the L-shaped slots latches the tiles in place over the runners. Resilient contact strips between the parallel arms of the U-shaped tiles and the wall assure thermal and electrical contact with the wall.
NASA Technical Reports Server (NTRS)
Sanz, J.; Pischel, K.; Hubler, D.
1992-01-01
An application for parallel computation on a combined cluster of powerful workstations and supercomputers was developed. A Parallel Virtual Machine (PVM) is used as message passage language on a macro-tasking parallelization of the Aerodynamic Inverse Design and Analysis for a Full Engine computer code. The heterogeneous nature of the cluster is perfectly handled by the controlling host machine. Communication is established via Ethernet with the TCP/IP protocol over an open network. A reasonable overhead is imposed for internode communication, rendering an efficient utilization of the engaged processors. Perhaps one of the most interesting features of the system is its versatile nature, that permits the usage of the computational resources available that are experiencing less use at a given point in time.
SWMM5 Application Programming Interface and PySWMM: A ...
In support of the OpenWaterAnalytics open source initiative, the PySWMM project encompasses the development of a Python interfacing wrapper to SWMM5 with parallel ongoing development of the USEPA Stormwater Management Model (SWMM5) application programming interface (API). ... The purpose of this work is to increase the utility of the SWMM dll by creating a Toolkit API for accessing its functionality. The utility of the Toolkit is further enhanced with a wrapper to allow access from the Python scripting language. This work is being prosecuted as part of an Open Source development strategy and is being performed by volunteer software developers.
XaNSoNS: GPU-accelerated simulator of diffraction patterns of nanoparticles
NASA Astrophysics Data System (ADS)
Neverov, V. S.
XaNSoNS is an open source software with GPU support, which simulates X-ray and neutron 1D (or 2D) diffraction patterns and pair-distribution functions (PDF) for amorphous or crystalline nanoparticles (up to ∼107 atoms) of heterogeneous structural content. Among the multiple parameters of the structure the user may specify atomic displacements, site occupancies, molecular displacements and molecular rotations. The software uses general equations nonspecific to crystalline structures to calculate the scattering intensity. It supports four major standards of parallel computing: MPI, OpenMP, Nvidia CUDA and OpenCL, enabling it to run on various architectures, from CPU-based HPCs to consumer-level GPUs.
[Effect of urapidil combined with phentolamine on hypertension during extracorporeal circulation].
Wang, Fangjun; Chen, Bin; Liu, Yang; Tu, Faping
2014-08-01
To study the effect of urapidil combined with phentolamine in the management of hypertension during extracorporeal circulation. Ninety patients undergoing aortic and mitral valve replacement were randomly divided into 3 equal groups to receive treatment with phentolamine (group A), urapidil (group B), or both (group C) during extracorporeal circulation. The mean arterial pressure (MAP) before and after drug administration, time interval of two administrations, spontaneous recovery of heart beat after aorta unclamping, ventricular arrhythmia, changes of ST-segment 1 min after the recovery of heart beat, ante-parallel cycle time, aorta clamping time, post-parallel cycle time, dopamine dose after cardiac resuscitation, and perioperative changes of plasma TNF-α and IL-6 levels were recorded. There was no significant difference in MAP between the 3 groups before or after hypotensive drug administration (P>0.05). The time interval of two hypotensive drug administrations was longer in group C than in groups A and B (P<0.05). The incidence of spontaneous recovery of heart beat after aorta unclamping, incidence of ventricular arrhythmia, changes of ST-segment 1 min after the recovery of heart beat, ante-parallel cycle time, aorta clamping time, and post-parallel cycle time were all comparable between the 3 groups. The dose of dopamine administered after cardiac resuscitation was significantly larger in group B than in groups A or group C (P<0.05). The plasma levels of TNF-α and IL-6 were significantly increased after CPB and after the operation in all the groups, but were lowed in group C than in groups A and B at the end of CPB and at 2 h and 12 after the operation. Urapidil combined with phentolamine can control hypertension during extracorporeal circulation without causing hypotension.
Leung, Chung Ming; Wang, Ya; Chen, Wusi
2016-11-01
In this letter, the airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively magnetized formation of 6 magnets to explore enhanced power density. In particular, the magnet array was positioned in parallel to the trajectory of the tip coil within its tip deflection span. The finite element simulations of the magnetic flux density and induced voltages at an open circuit condition were studied to find the maximum number of alternatively magnetized magnets that was required for the proposed energy harvester. Experimental results showed that the energy harvester with a pair of 6 alternatively magnetized linear magnet arrays was able to generate an induced voltage (V o ) of 20 V, with an open circuit condition, and 475 mW, under a 30 Ω optimal resistance load operating with the wind speed (U) at 7 m/s and a natural bending frequency of 3.54 Hz. Compared to the traditional electromagnetic energy harvester with a single magnet moving through a coil, the proposed energy harvester, containing multi-pole magnets and parallel array motion, enables the moving coil to accumulate a stronger magnetic flux in each period of the swinging motion. In addition to the comparison made with the airfoil-based piezoelectric energy harvester of the same size, our proposed electromagnetic energy harvester generates 11 times more power output, which is more suitable for high-power-density energy harvesting applications at regions with low environmental frequency.
Use of general purpose graphics processing units with MODFLOW
Hughes, Joseph D.; White, Jeremy T.
2013-01-01
To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Open Ocean Internal Waves, South China Sea
NASA Technical Reports Server (NTRS)
1989-01-01
These open ocean internal waves were seen in the south China Sea (19.5N, 114.5E). These sets of internal waves most likely coincide with tidal periods about 12 hours apart. The wave length (distance from crest to crest) varies between 1.5 and 5.0 miles and the crest lengths stretch across and beyond this photo for over 75 miles. At lower right, the surface waves are moving at a 30% angle to the internal waves, with parallel low level clouds.
OpenGl Visualization Tool and Library Version: 1.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
2010-06-22
GLVis is an OpenGL tool for visualization of finite element meshes and functions. When started without any options, GLVis starts a server, which waits for a socket connections and visualizes any recieved data. This way the results of simulations on a remote (parallel) machine can be visualized on the lical user desktop. GLVis can also be used to visualize a mesh with or without a finite element function (solution). It can run a batch sequence of commands (GLVis scripts), or display previously saved socket streams.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wagner, Michael; Ma, Zhiwen; Martinek, Janna
An aspect of the present disclosure is a receiver for receiving radiation from a heliostat array that includes at least one external panel configured to form an internal cavity and an open face. The open face is positioned substantially perpendicular to a longitudinal axis and forms an entrance to the internal cavity. The receiver also includes at least one internal panel positioned within the cavity and aligned substantially parallel to the longitudinal axis, and the at least one internal panel includes at least one channel configured to distribute a heat transfer medium.
NASA Astrophysics Data System (ADS)
Reinovsky, R. E.; Levi, P. S.; Bueck, J. C.; Goforth, J. H.
The Air Force Weapons Laboratory, working jointly with Los Alamos National Laboratory, has conducted a series of experiments directed at exploring composite, or staged, switching techniques for use in opening switches in applications which require the conduction of very high currents (or current densities) with very low losses for relatively long times (several tens of microseconds), and the interruption of these currents in much shorter times (ultimately a few hundred nanoseconds). The results of those experiments are reported.
Earthshots: Satellite images of environmental change – Riyadh, Saudi Arabia
,
2013-01-01
Located about 35 kilometers north of Riyadh, King Khalid International Airport opened in 1983, so it only appears in the images after that date. The two parallel runways are each 4,200 meters long. The airport occupies about 225 square kilometers.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-07
... Production Act of 1993--Open Axis Group, Inc. Notice is hereby given that, on May 31, 2011, pursuant to.... (``the Act''), Open Axis Group, Inc. (``Open Axis'') has filed written notifications simultaneously with... planned activity of the group research project. Membership in this group research project remains open...
Stringfield, V.T.; Rapp, J.R.; Anders, R.B.
1979-01-01
The results of the natural processes caused by solution and leaching of limestone, dolomite, gypsum, salt and other soluble rocks, is known as karst. Development of karst is commonly known as karstification, which may have a pronounced effect on the topography, hydrology and environment, especially where such karst features as sinkholes and vertical solution shafts extend below the land surface and intersect lateral solution passages, cavities, caverns and other karst features in carbonate rocks. Karst features may be divided into two groups: (1) surficial features that do not extend far below the surface; and (2) karst features such as sinkholes that extend below the surface and affect the circulation of water below. The permeability of the most productive carbonate aquifers is due chiefly to enlargement of fractures and other openings by circulation of water. Important controlling factors responsible for the development of karst and permeability in carbonate aquifers include: (1) climate, topography, and presence of soluble rocks; (2) geologic structure; (3) nature of underground circulation; and (4) base level. Another important factor is the condition of the surface of the carbonate rocks at the time they are exposed to meteoric water. A carbonate rock surface, with soil or relatively permeable, less soluble cover, is more favorable for initiation of karstification and solution than bare rocks. Water percolates downward through the cover to the underlying carbonate rocks instead of running off on the surface. Also, the water becomes more corrosive as it percolates through the permeable cover to the underlying carbonate rocks. Where there is no cover or the cover has been removed, the carbonate rocks become case hardened and resistant to erosion. However, in regions underlain not only by carbonate rocks but also by beds of anhydrite, gypsum and salt, such as the Hueco Plateau in southeastern New Mexico, subsurface solution may occur where water without natural acids moves down from bare rock surfaces through cracks to the beds that are more soluble than carbonate rocks. For example, in the area of Carlsbad Caverns in southeastern New Mexico, much of the water responsible for solution that formed the caverns apparently entered the groundwater system through large open fractures and did not form sinkhole topography. East of the Carlsbad Caverns, however, in the Pecos River Valley where the carbonate rocks are overlain by the less soluble Ogallala Formation of Late Tertiary age, solution began along escarpments as the Pecos River and its tributaries cut through the less soluble cover. As these escarpments retreated, sinkholes and other karst features developed. Joints or fractures are essential for initiation of downward percolation of water in compact carbonate rocks such as some Paleozoic limestone in which there is no intergranular permeability. Also joints or fractures and bedding planes may be essential in the initiation of lateral movement of water in the zone of saturation. Where conditions of recharge and discharge are favorable, groundwater may move parallel to the dip. However, the direction of movement of water in most carbonate rocks is not necessarily down dip or parallel to the dip. The general direction of movement of both surface and groundwater may be parallel to the strike in a breached anticline. Faults may restrict the lateral movement of water, especially if water-bearing beds are faulted against relatively impervious beds. Conversely, some fault may serve as avenues through which water may move as, for example, in the Cretaceous Edwards aquifer in the San Antonio area, Texas. Karst aquifers, chiefly carbonate rocks, may be placed in three groups according to water-bearing capacity. Water in aquifers of group 1 occurs chiefly in joints, fractures, and other openings that have not been enlarged by solution. The yield of wells is small. Aquifers in group 2, with low to intermediate yields, are those in which
Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grama, Ananth
2013-12-18
A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less
Mobile Romberg test assessment (mRomberg).
Galán-Mercant, Alejandro; Cuesta-Vargas, Antonio I
2014-09-12
The diagnosis of frailty is based on physical impairments and clinicians have indicated that early detection is one of the most effective methods for reducing the severity of physical frailty. Maybe, an alternative to the classical diagnosis could be the instrumentalization of classical functional testing, as Romberg test or Timed Get Up and Go Test. The aim of this study was (I) to measure and describe the magnitude of accelerometry values in the Romberg test in two groups of frail and non-frail elderly people through instrumentation with the iPhone 4®, (II) to analyse the performances and differences between the study groups, and (III) to analyse the performances and differences within study groups to characterise accelerometer responses to increasingly difficult challenges to balance. This is a cross-sectional study of 18 subjects over 70 years old, 9 frail subjects and 9 non-frail subjects. The non-parametric Mann-Whitney U test was used for between-group comparisons in means values derived from different tasks. The Wilcoxon Signed-Rank test was used to analyse differences between different variants of the test in both independent study groups. The highest difference between groups was found in the accelerometer values with eyes closed and feet parallel: maximum peak acceleration in the lateral axis (p < 0.01), minimum peak acceleration in the lateral axis (p < 0.01) and minimum peak acceleration from the resultant vector (p < 0.01). Subjects with eyes open and feet parallel, greatest differences found between the groups were in the maximum peak acceleration in the lateral axis (p < 0.01), minimum peak acceleration in the lateral axis (p < 0.01) and minimum peak acceleration from the resultant vector (p < 0.001). With eyes closed and feet in tandem, the greatest differences found between the groups were in the minimum peak acceleration in the lateral axis (p < 0.01). The accelerometer fitted in the iPhone 4® is able to study and analyse the kinematics of the Romberg test between frail and non-frail elderly people. In addition, the results indicate that the accelerometry values also were significantly different between the frail and non-frail groups, and that values from the accelerometer accelerometer increased as the test was made more complicated.
Barriers to the Treatment of Mental Illness in Primary Care Clinics in Israel.
Ayalon, Liat; Karkabi, Khaled; Bleichman, Igor; Fleischmann, Silvia; Goldfracht, Margalit
2016-03-01
The present study examined physicians' perceived barriers to the management of mental illness in primary care settings in Israel. Seven focus groups that included a total of 52 primary care Israeli physicians were conducted. Open coding analysis was employed, consisting of constant comparisons within and across interviews. Three major themes emerged: (a) barriers to the management of mental illness at the individual-level, (b) barriers to the management of mental illness at the system-level, and (c) the emotional ramifications that these barriers have on physicians. The findings highlight the parallelism between the experiences of primary care physicians and their patients. The findings also stress the need to attend to physicians' emotional reactions when working with patients who suffer from mental illness and to better structure mental health treatment in primary care.
Fenoterol stimulates human erythropoietin production via activation of the renin angiotensin system
Freudenthaler, S M; Schenck, T; Lucht, I; Gleiter, C H
1999-01-01
Aims The present study assessed the hypothesis that the β2 sympathomimetic fenoterol influences the production of erythropoietin (EPO) by activation of the renin angiotensin system (RAS), i.e. angiotensin II. Methods In an open, parallel, randomized study healthy volunteers received i.v. either placebo (electrolyte solution), fenoterol or fenoterol in combination with an oral dose of the AT1-receptor antagonist losartan. Results Compared with placebo treatment AUCEPO(0,24 h) was significantly increased after fenoterol application by 48% whereas no increase in the group receiving fenoterol and losartan could be detected. The rise of PRA was statistically significant under fenoterol and fenoterol plus lorsartan. Conclusions Stimulation of EPO production during fenoterol infusion appears to be angiotensin II-mediated. Thus, angiotensin II may be considered as one important physiological modulator of EPO production in humans. PMID:10583037
Fenoterol stimulates human erythropoietin production via activation of the renin angiotensin system.
Freudenthaler, S M; Schenck, T; Lucht, I; Gleiter, C H
1999-10-01
The present study assessed the hypothesis that the beta2 sympathomimetic fenoterol influences the production of erythropoietin (EPO) by activation of the renin angiotensin system (RAS), i.e. angiotensin II. In an open, parallel, randomized study healthy volunteers received i.v. either placebo (electrolyte solution), fenoterol or fenoterol in combination with an oral dose of the AT1-receptor antagonist losartan. Compared with placebo treatment AUCEPO(0,24 h) was significantly increased after fenoterol application by 48% whereas no increase in the group receiving fenoterol and losartan could be detected. The rise of PRA was statistically significant under fenoterol and fenoterol plus lorsartan. Stimulation of EPO production during fenoterol infusion appears to be angiotensin II-mediated. Thus, angiotensin II may be considered as one important physiological modulator of EPO production in humans.
Soft Pneumatic Actuator Fascicles for High Force and Reliability
Robertson, Matthew A.; Sadeghi, Hamed; Florez, Juan Manuel
2017-01-01
Abstract Soft pneumatic actuators (SPAs) are found in mobile robots, assistive wearable devices, and rehabilitative technologies. While soft actuators have been one of the most crucial elements of technology leading the development of the soft robotics field, they fall short of force output and bandwidth requirements for many tasks. In addition, other general problems remain open, including robustness, controllability, and repeatability. The SPA-pack architecture presented here aims to satisfy these standards of reliability crucial to the field of soft robotics, while also improving the basic performance capabilities of SPAs by borrowing advantages leveraged ubiquitously in biology; namely, the structured parallel arrangement of lower power actuators to form the basis of a larger and more powerful actuator module. An SPA-pack module consisting of a number of smaller SPAs will be studied using an analytical model and physical prototype. Experimental measurements show an SPA pack to generate over 112 N linear force, while the model indicates the benefit of parallel actuator grouping over a geometrically equivalent single SPA scale as an increasing function of the number of individual actuators in the group. For a module of four actuators, a 23% increase in force production over a volumetrically equivalent single SPA is predicted and validated, while further gains appear possible up to 50%. These findings affirm the advantage of utilizing a fascicle structure for high-performance soft robotic applications over existing monolithic SPA designs. An example of high-performance soft robotic platform will be presented to demonstrate the capability of SPA-pack modules in a complete and functional system. PMID:28289573
Soft Pneumatic Actuator Fascicles for High Force and Reliability.
Robertson, Matthew A; Sadeghi, Hamed; Florez, Juan Manuel; Paik, Jamie
2017-03-01
Soft pneumatic actuators (SPAs) are found in mobile robots, assistive wearable devices, and rehabilitative technologies. While soft actuators have been one of the most crucial elements of technology leading the development of the soft robotics field, they fall short of force output and bandwidth requirements for many tasks. In addition, other general problems remain open, including robustness, controllability, and repeatability. The SPA-pack architecture presented here aims to satisfy these standards of reliability crucial to the field of soft robotics, while also improving the basic performance capabilities of SPAs by borrowing advantages leveraged ubiquitously in biology; namely, the structured parallel arrangement of lower power actuators to form the basis of a larger and more powerful actuator module. An SPA-pack module consisting of a number of smaller SPAs will be studied using an analytical model and physical prototype. Experimental measurements show an SPA pack to generate over 112 N linear force, while the model indicates the benefit of parallel actuator grouping over a geometrically equivalent single SPA scale as an increasing function of the number of individual actuators in the group. For a module of four actuators, a 23% increase in force production over a volumetrically equivalent single SPA is predicted and validated, while further gains appear possible up to 50%. These findings affirm the advantage of utilizing a fascicle structure for high-performance soft robotic applications over existing monolithic SPA designs. An example of high-performance soft robotic platform will be presented to demonstrate the capability of SPA-pack modules in a complete and functional system.
Urdl, Wolfgang; Apter, Dan; Alperstein, Alan; Koll, Peter; Schönian, Siegfried; Bringer, Jacques; Fisher, Alan C; Preik, Michael
2005-08-01
To investigate contraceptive efficacy, compliance and user's satisfaction with transdermal versus oral contraception (OC). Randomized, open-label, parallel-group trial conducted at 65 centers in Europe and South Africa. One thousand four hundred and eighty-nine women received a contraceptive patch (n = 846) or an OC (n = 643) for 6 or 13 cycles. Overall/method-failure Pearl Indices were 0.88/0.66 with the patch and 0.56/0.28 with the OC (p = n.s.). Compliance was higher at all age groups with the patch compared to the OC. Significantly more users were very satisfied with the contraceptive patch than with the OC. The percentage of patch users being very satisfied increased with age whereas it did not in the OC group. Likewise, improvements of premenstrual symptoms as well as emotional and physical well-being increased with age in the patch-group in contrast to the OC group. Ratings of satisfaction with the study medication correlated weakly with emotional (r = 0.33) and physical well-being (r = 0.39) as well as premenstrual symptoms (r = 0.30; p < 0.001). Contraceptive efficacy of the patch is comparable to OC, but compliance is consistently better at all age groups. Higher satisfaction with the patch at increasing age may be attributed to improvements in emotional and physical well-being as well as reduction of premenstrual symptoms.
OpenSeesPy: Python library for the OpenSees finite element framework
NASA Astrophysics Data System (ADS)
Zhu, Minjie; McKenna, Frank; Scott, Michael H.
2018-01-01
OpenSees, an open source finite element software framework, has been used broadly in the earthquake engineering community for simulating the seismic response of structural and geotechnical systems. The framework allows users to perform finite element analysis with a scripting language and for developers to create both serial and parallel finite element computer applications as interpreters. For the last 15 years, Tcl has been the primary scripting language to which the model building and analysis modules of OpenSees are linked. To provide users with different scripting language options, particularly Python, the OpenSees interpreter interface was refactored to provide multi-interpreter capabilities. This refactoring, resulting in the creation of OpenSeesPy as a Python module, is accomplished through an abstract interface for interpreter calls with concrete implementations for different scripting languages. Through this approach, users are able to develop applications that utilize the unique features of several scripting languages while taking advantage of advanced finite element analysis models and algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Timothy C.; Hammond, Glenn E.; Chen, Xingyuan
Time-lapse electrical resistivity tomography (ERT) is finding increased application for remotely monitoring processes occurring in the near subsurface in three-dimensions (i.e. 4D monitoring). However, there are few codes capable of simulating the evolution of subsurface resistivity and corresponding tomographic measurements arising from a particular process, particularly in parallel and with an open source license. Herein we describe and demonstrate an electrical resistivity tomography module for the PFLOTRAN subsurface simulation code, named PFLOTRAN-E4D. The PFLOTRAN-E4D module operates in parallel using a dedicated set of compute cores in a master-slave configuration. At each time step, the master processes receives subsurface states frommore » PFLOTRAN, converts those states to bulk electrical conductivity, and instructs the slave processes to simulate a tomographic data set. The resulting multi-physics simulation capability enables accurate feasibility studies for ERT imaging, the identification of the ERT signatures that are unique to a given process, and facilitates the joint inversion of ERT data with hydrogeological data for subsurface characterization. PFLOTRAN-E4D is demonstrated herein using a field study of stage-driven groundwater/river water interaction ERT monitoring along the Columbia River, Washington, USA. Results demonstrate the complex nature of changes subsurface electrical conductivity, in both the saturated and unsaturated zones, arising from water table changes and from river water intrusion into the aquifer. The results also demonstrate the sensitivity of surface based ERT measurements to those changes over time. PFLOTRAN-E4D is available with the PFLOTRAN development version with an open-source license at https://bitbucket.org/pflotran/pflotran-dev .« less
NASA Astrophysics Data System (ADS)
Johnson, Timothy C.; Hammond, Glenn E.; Chen, Xingyuan
2017-02-01
Time-lapse electrical resistivity tomography (ERT) is finding increased application for remotely monitoring processes occurring in the near subsurface in three-dimensions (i.e. 4D monitoring). However, there are few codes capable of simulating the evolution of subsurface resistivity and corresponding tomographic measurements arising from a particular process, particularly in parallel and with an open source license. Herein we describe and demonstrate an electrical resistivity tomography module for the PFLOTRAN subsurface flow and reactive transport simulation code, named PFLOTRAN-E4D. The PFLOTRAN-E4D module operates in parallel using a dedicated set of compute cores in a master-slave configuration. At each time step, the master processes receives subsurface states from PFLOTRAN, converts those states to bulk electrical conductivity, and instructs the slave processes to simulate a tomographic data set. The resulting multi-physics simulation capability enables accurate feasibility studies for ERT imaging, the identification of the ERT signatures that are unique to a given process, and facilitates the joint inversion of ERT data with hydrogeological data for subsurface characterization. PFLOTRAN-E4D is demonstrated herein using a field study of stage-driven groundwater/river water interaction ERT monitoring along the Columbia River, Washington, USA. Results demonstrate the complex nature of subsurface electrical conductivity changes, in both the saturated and unsaturated zones, arising from river stage fluctuations and associated river water intrusion into the aquifer. The results also demonstrate the sensitivity of surface based ERT measurements to those changes over time. PFLOTRAN-E4D is available with the PFLOTRAN development version with an open-source license at https://bitbucket.org/pflotran/pflotran-dev.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hornung, Richard D.; Hones, Holger E.
The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less
Molla, Mijanur R; Böser, Alexander; Rana, Akshita; Schwarz, Karina; Levkin, Pavel A
2018-04-18
Efficient delivery of nucleic acids into cells is of great interest in the field of cell biology and gene therapy. Despite a lot of research, transfection efficiency and structural diversity of gene-delivery vectors are still limited. A better understanding of the structure-function relationship of gene delivery vectors is also essential for the design of novel and intelligent delivery vectors, efficient in "difficult-to-transfect" cells and in vivo clinical applications. Most of the existing strategies for the synthesis of gene-delivery vectors require multiple steps and lengthy procedures. Here, we demonstrate a facile, three-component one-pot synthesis of a combinatorial library of 288 structurally diverse lipid-like molecules termed "lipidoids" via a thiolactone ring opening reaction. This strategy introduces the possibility to synthesize lipidoids with hydrophobic tails containing both unsaturated bonds and reducible disulfide groups. The whole synthesis and purification are convenient, extremely fast, and can be accomplished within a few hours. Screening of the produced lipidoids using HEK293T cells without addition of helper lipids resulted in identification of highly stable liposomes demonstrating ∼95% transfection efficiency with low toxicity.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
NASA Astrophysics Data System (ADS)
Marx, Alain; Lütjens, Hinrich
2017-03-01
A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...
2017-11-14
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
NASA Astrophysics Data System (ADS)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.
2017-11-01
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
Li, B B; Lin, F; Cai, L H; Chen, Y; Lin, Z J
2017-08-01
Objective: To evaluate the effects of parallel versus perpendicular double plating for distal humerus fracture of type C. Methods: A standardized comprehensive literature search was performed by PubMed, Embase, Cochrane library, CMB, CNKI and Medline datebase.Randomized controlled studies on comparison between parallel versus perpendicular double plating for distal humerus fracture of type C before December 2015 were enrolled in the study.All date were analyzed by the RevMan 5.2 software. Results: Six studies, including 284 patients, met the inclusion criteria.There were 155 patients in perpendicular double plating group, 129 patients in parallel double plating group.The results of Meta-analysis indicated that there were statistically significant difference between the two groups in complications ( OR =2.59, 95% CI : 1.03 to 6.53, P =0.04). There was no significant difference between the two groups in surgical duration ( MD =-1.84, 95% CI : -9.06 to 5.39, P =0.62), bone union time ( MD =0.09, 95% CI : -0.06 to 0.24, P =0.22), Mayo Elbow Performance Score ( MD =0.09, 95% CI : -0.06 to 0.24, P =0.22), Range of Motions ( MD =-0.92, 95% CI : -4.65 to 2.81, P =0.63) and the rate of excellent and good results ( OR =0.64, 95% CI : 0.27 to 1.52, P =0.31). Conclusion: Both perpendicular and parallel double plating are effective in distal humerus fracture of type C, parallel double plating has less complications.
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Abe, Masanori; Higuchi, Terumi; Moriuchi, Masari; Okamura, Masahiro; Tei, Ritsukou; Nagura, Chinami; Takashima, Hiroyuki; Kikuchi, Fumito; Tomita, Hyoe; Okada, Kazuyoshi
2016-06-01
Saxagliptin is a dipeptidyl peptidase-4 inhibitor that was approved in Japan for the treatment of type 2 diabetes in 2013. We examined its efficacy and safety in Japanese hemodialysis patients with diabetic nephropathy. In this prospective, open-label, parallel-group study, Japanese hemodialysis patients were randomized to receive either oral saxagliptin (2.5mg/day) or usual care (control group) for 24weeks. Before randomization, patients received fixed doses of conventional antidiabetic drugs (oral drugs and/or insulin) for 8weeks; these drugs were continued during the study. Endpoints included changes in glycated albumin (GA), hemoglobin A1c (HbA1c), postprandial plasma glucose (PPG), and adverse events. Both groups included 41 patients. Mean GA, HbA1c, and PPG decreased significantly in the saxagliptin group (-3.4%, -0.6% [-7mmol/mol], and -38.3mg/dL, respectively; all P<0.0001) but not in the control group (0%, -0.1% [-1mmol/mol], and -3.7mg/dL, respectively) (P<0.0001, P<0.001, and P<0.0001, respectively). In saxagliptin-treated patients, the reduction in GA was significantly greater when saxagliptin was administered as monotherapy than in combination therapy (-4.2% vs. -3.0%, P=0.012) despite similar baseline values (24.5% vs. 23.3%). Reductions in GA, HbA1c, and PPG were greater in patients whose baseline values exceeded the median (23.8% for GA, 6.6% for HbA1c, and 180mg/dL for PPG). There were no adverse events associated with saxagliptin. Saxagliptin (2.5mg/day) was effective and well tolerated when used as monotherapy or combined with other antidiabetic drugs in Japanese hemodialysis patients with type 2 diabetes. UMIN000018445. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Crystal structures of two mixed-valence copper cyanide complexes with N-methylethylenediamine
Sabatino, Alexander
2017-01-01
The crystal structures of two mixed-valence copper cyanide compounds involving N-methylethylenediamine (meen), are described. In compound (I), poly[bis(μ3-cyanido-κ3 C:C:N)tris(μ2-cyanido-κ2 C:N)bis(N-methylethane-1,2-diamine-κ2 N,N′)tricopper(I)copper(II)], [Cu4(CN)5(C3H10N2)2] or Cu4(CN)5meen2, cyanide groups link CuI atoms into a three-dimensional network containing open channels parallel to the b axis. In the network, two tetrahedrally bound CuI atoms are bonded by the C atoms of two end-on bridging CN groups to form Cu2(CN)6 moieties with the Cu atoms in close contact at 2.560 (1) Å. Other trigonally bound CuI atoms link these units together to form the network. The CuII atoms, coordinated by two meen units, are covalently linked to the network via a cyanide bridge, and project into the open network channels. In the molecular compound (II), [(N-methylethylenediamine-κ2 N,N′)copper(II)]-μ2-cyanido-κ2 C:N-[bis(cyanido-κC)copper(I)] monohydrate, [Cu2(CN)3(C3H10N2)2]·H2O or Cu2(CN)3meen2·H2O, a CN group connects a CuII atom coordinated by two meen groups with a trigonal–planar CuI atom coordinated by CN groups. The molecules are linked into centrosymmetric dimers via hydrogen bonds to two water molecules. In both compounds, the bridging cyanide between the CuII and CuI atoms has the N atom bonded to CuII and the C atom bonded to CuI, and the CuII atoms are in a square-pyramidal coordination. PMID:28217329
Kadouch, D J; Elshot, Y S; Zupan-Kajcovski, B; van Haersma de With, A S E; van der Wal, A C; Leeflang, M; Jóźwiak, K; Wolkerstorfer, A; Bekkenk, M W; Spuls, P I; de Rie, M A
2017-09-01
Routine punch biopsies are considered to be standard care for diagnosing and subtyping basal cell carcinoma (BCC) when clinically suspected. We assessed the efficacy of a one-stop-shop concept using in vivo reflectance confocal microscopy (RCM) imaging as a diagnostic tool vs. standard care for surgical treatment in patients with clinically suspected BCC. In this open-label, parallel-group, noninferiority, randomized controlled multicentre trial we enrolled patients with clinically suspected BCC at two tertiary referral centres in Amsterdam, the Netherlands. Patients were randomly assigned to the RCM one-stop-shop (diagnosing and subtyping using RCM followed by direct surgical excision) or standard care (planned excision based on the histological diagnosis and subtype of a punch biopsy). The primary outcome was the proportion of patients with tumour-free margins after surgical excision of BCC. Of the 95 patients included, 73 (77%) had a BCC histologically confirmed using a surgical excision specimen. All patients (40 of 40, 100%) in the one-stop-shop group had tumour-free margins. In the standard-care group tumour-free margins were found in all but two patients (31 of 33, 94%). The difference in the proportion of patients with tumour-free margins after BCC excision between the one-stop-shop group and the standard-care group was -0·06 (90% confidence interval -0·17-0·01), establishing noninferiority. The proposed new treatment strategy seems suitable in facilitating early diagnosis and direct treatment for patients with BCC, depending on factors such as availability of RCM, size and site of the lesion, patient preference and whether direct surgical excision is feasible. © 2017 The Authors. British Journal of Dermatology published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists.
Garg, Satish K; Mathieu, Chantal; Rais, Nadeem; Gao, Haitao; Tobian, Janet A; Gates, Jeffrey R; Ferguson, Jeffrey A; Webb, David M; Berclaz, Pierre-Yves
2009-09-01
Patients with type 1 diabetes require intensive insulin therapy for optimal glycemic control. AIR((R)) inhaled insulin (system from Eli Lilly and Company, Indianapolis, IN) (AIR is a registered trademark of Alkermes, Inc., Cambridge, MA) may be an efficacious and safe alternative to subcutaneously injected (SC) mealtime insulin. This was a Phase 3, 2-year, randomized, open-label, active-comparator, parallel-group study in 385 patients with type 1 diabetes who were randomly assigned to receive AIR insulin or SC insulin (regular human insulin or insulin lispro) at mealtimes. Both groups received insulin glargine once daily. Efficacy measures included mean change in hemoglobin A1C (A1C) from baseline to end point, eight-point self-monitored blood glucose profiles, and insulin dosage. Safety assessments included hypoglycemic events, pulmonary function tests, adverse events, and insulin antibody levels. In both treatment groups, only 20% of subjects reached the target of A1C <7.0%. A significant A1C difference of 0.44% was seen favoring SC insulin, with no difference between the groups in insulin doses or hypoglycemic events at end point. Patients in both treatment groups experienced progressive decreases in lung function, but larger (reversible) decrements in diffusing capacity of the lung for carbon monoxide (DL(CO)) were associated with AIR insulin treatment. Greater weight gain was seen with SC insulin treatment. The AIR inhaled insulin program was terminated by the sponsor prior to availability of any Phase 3 data for reasons unrelated to safety or efficacy. Despite early termination, this trial provides evidence that AIR insulin was less efficacious in lowering A1C and was associated with a greater decrease in DL(CO) and increased incidence of cough than SC insulin in patients with type 1 diabetes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A. T.
2014-04-20
This code adds an implementation of PMIX_Ring to the existing PM12 Library in the SLURM open source software package (Simple Linux Utility for Resource Management). PMIX_Ring executes a particular communication pattern that is used to bootstrap connections between MPI processes in a parallel job.
Software Applications on the Peregrine System | High-Performance Computing
programming and optimization. Gaussian Chemistry Program for calculating molecular electronic structure and Materials Science Open-source classical molecular dynamics program designed for massively parallel systems framework Q-Chem Chemistry ab initio quantum chemistry package for predictin molecular structures
The Literacy Component of Mathematical and Scientific Literacy
ERIC Educational Resources Information Center
Yore, Larry D.; Pimm, David; Tuan, Hsiao-Lin
2007-01-01
This opening article of the Special Issue makes an argument for parallel definitions of scientific literacy and mathematical literacy that have shared features: importance of general cognitive and metacognitive abilities and reasoning/thinking and discipline-specific language, habits-of-mind/emotional dispositions, and information communication…
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Pythran: enabling static optimization of scientific Python programs
NASA Astrophysics Data System (ADS)
Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan
2015-01-01
Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
NASA Astrophysics Data System (ADS)
Alves Júnior, A. A.; Sokoloff, M. D.
2017-10-01
MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on Linux systems. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of performance in a variety of environments
Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...
2013-07-18
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.
Multicenter clinical study on the treatment of children's tic disorder with Qufeng Zhidong Recipe.
Wu, Min; Xiao, Guang-hua; Yao, Min; Zhang, Jian-ming; Zhang, Xin; Zhou, Ya-bing; Zhang, Jing-yan; Wang, Shu-xia; Ma, Bo; Chen, Yan-ping
2009-08-01
To assess the effect and adverse reaction of Qufeng Zhidong Recipe (QZR) in treating children's tic disorder (TD). With multicenter randomized parallel open-controlled method adopted, the patients enrolled were assigned to two groups, 41 cases in the Chinese medicine (CM) group and 40 in the Western medicine (WM) group. They were treated by QZR and haloperidol plus trihexyphenidyl respectively for 12 weeks as one course. In total, two courses of treatment were given. The curative effect and adverse reactions were evaluated by scoring with Yale Global Tic Severity Scale (YGTSS), Traditional Chinese Medicine Syndrome Scale (TCMSS), and Treatment Emergent Symptom Scale (TESS), as well as results of laboratory examinations. After one course of treatment, the markedly effective rate in the CM and the WM group was 14.6% and 17.5%, respectively, and the total effective rate 43.9% and 47.5%, respectively, which showed insignificant difference between groups (P>0.05). However, after two courses of treatment, markedly effective rate in them was 73.2% and 7.5%, and the total effective rate was 100.0% and 57.5%, both showing significant differences between groups (P<0.05). Besides, the adverse reactions occurred in the CM group was less than that in the WM group obviously. QZR has definite curative effect with no apparent adverse reaction in treating TD, and it can obviously improve the symptoms and signs and upgrade the quality of life and learning capacities in such patients.
A numerical differentiation library exploiting parallel architectures
NASA Astrophysics Data System (ADS)
Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.
2009-08-01
We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
DRY CUPPING IN CHILDREN WITH FUNCTIONAL CONSTIPATION: A RANDOMIZED OPEN LABEL CLINICAL TRIAL.
Shahamat, Mahmoud; Daneshfard, Babak; Najib, Khadijeh-Sadat; Dehghani, Seyed Mohsen; Tafazoli, Vahid; Kasalaei, Afshineh
2016-01-01
As a common disease in pediatrics, constipation poses a high burden to the community. In this study, we aimed to investigate the efficacy of dry cupping therapy (an Eastern traditional manipulative therapy) in children with functional constipation. One hundred and twenty children (4-18 years old) diagnosed as functional constipation according to ROME III criteria were assigned to receive a traditional dry cupping protocol on the abdominal wall for 8 minutes every other day or standard laxative therapy (Polyethylene glycol (PEG) 40% solution without electrolyte), 0.4 g/kg once daily) for 4 weeks, in an open label randomized controlled clinical trial using a parallel design with a 1:1 allocation ratio. Patients were evaluated prior to and following 2, 4, 8 and 12 weeks of the intervention commencement in terms of the ROME III criteria for functional constipation. There were no significant differences between the two arms regarding demographic and clinical basic characteristics. After two weeks of the intervention, there was a significant better result in most of the items of ROME III criteria of patients in PEG group. In contrast, after four weeks of the intervention, the result was significantly better in the cupping group. There was no significant difference in the number of patients with constipation after 4 and 8 weeks of the follow-up period. This study showed that dry cupping of the abdominal wall, as a traditional manipulative therapy, can be as effective as standard laxative therapy in children with functional constipation.
Addressing Vermont's concerns on climate change on many levels
NASA Astrophysics Data System (ADS)
Betts, A. K.
2016-12-01
As a climate scientist, I realized about 12 years ago that one of my responsibilities was to help Vermont understand and adapt to climate change. My road-map has four components: 1) Newspaper articles, radio and TV interviews to explain climate issues and how to deal with them in plain English (about 100 so far) 2) Public talks across the state to schools, professional, business and citizens groups, the legislature and state government - in fact to anyone that asked - with a willingness to honestly and clearly address all issues raised (230 so far). 3) Specific research on how the climate and seasonal cycle of Vermont have changed in the past, and are likely to change in the future, exploring the unknowns. 4) A personal web-site to make all my writings, talks and research open access (http://alanbetts.com). Because Vermont is a small state with a rural and environmental ethos and a strong desire to understand, I have been able to reach across the state in a decade. In parallel, Vermont has put in place an ambitious renewable energy policy, which is well underway. My multi-faceted strategy is open, clear and transparently honest, aimed at helping society understand and deal with this critical issue. This is in sharp contrast with the secret, deceptive multifaceted strategy to discredit climate science by well-funded right-wing groups (see Dark Money by Jane Mayer), in support of their political and economic agenda, which has found little support in Vermont.
ParallABEL: an R library for generalized parallelization of genome-wide association studies
2010-01-01
Background Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Results Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Conclusions Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL. PMID:20429914
A parallel form of the Gudjonsson Suggestibility Scale.
Gudjonsson, G H
1987-09-01
The purpose of this study is twofold: (1) to present a parallel form of the Gudjonsson Suggestibility Scale (GSS, Form 1); (2) to study test-retest reliabilities of interrogative suggestibility. Three groups of subjects were administered the two suggestibility scales in a counterbalanced order. Group 1 (28 normal subjects) and Group 2 (32 'forensic' patients) completed both scales within the same testing session, whereas Group 3 (30 'forensic' patients) completed the two scales between one week and eight months apart. All the correlations were highly significant, giving support for high 'temporal consistency' of interrogative suggestibility.
Ana, Monzó; Vicente, Montañana; María, Rubio José; Trinidad, García-Gimeno; Alberto, Romeu
2011-02-22
To compare the clinical results of four different protocols of COH for IVF-ICSI in normovulatory women, using in all cases pituitary suppression with GnRH antagonists. A single center, open label, parallel-controlled, prospective, post-authorization study under the approved conditions for use where 305 normal responders women who were candidates to COH were assigned to r-FSH +hp-hMG (n = 51, Group I), hp-hMG (n = 61, Group II), fixed-dose r-FSH (n = 118, Group III), and r-FSH with potential dose adjustment (n = 75, Group IV) to subsequently undergo IVF-ICSI. During stimulation, Group IV needed significantly more days of stimulation as compared to Group II [8.09 ± 1.25 vs. 7.62 ± 1.17; P < 0.05], but was the group in which more oocytes were recovered [Group I: 9.43 ± 4.99 vs. Group II: 8.96 ± 4.82 vs. Group III: 8.78 ± 3.72 vs. Group IV: 11.62 ± 5.80; P < 0.05]. No significant differences were seen between the groups in terms of clinical and ongoing pregnancy, but among patients in whom two embryos with similar quality parameters (ASEBIR) were transferred, the group treated with hp-hMG alone achieved a significantly greater clinical pregnancy rate as compared to all other groups [Group I: 31.6%, Group II: 56.4%, Group III: 28.7%, Group IV: 32.7%; P < 0.05]. Although randomized clinical trials should be conducted to achieve a more reliable conclusion, these observations support the concept that stimulation with hp-hMG could be beneficial in normal responders women undergoing pituitary suppression with GnRH antagonists.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
NASA Astrophysics Data System (ADS)
Fehr, M.; Navarro, V.; Martin, L.; Fletcher, E.
2013-08-01
Space Situational Awareness[8] (SSA) is defined as the comprehensive knowledge, understanding and maintained awareness of the population of space objects, the space environment and existing threats and risks. As ESA's SSA Conjunction Prediction Service (CPS) requires the repetitive application of a processing algorithm against a data set of man-made space objects, it is crucial to exploit the highly parallelizable nature of this problem. Currently the CPS system makes use of OpenMP[7] for parallelization purposes using CPU threads, but only a GPU with its hundreds of cores can fully benefit from such high levels of parallelism. This paper presents the adaptation of several core algorithms[5] of the CPS for general-purpose computing on graphics processing units (GPGPU) using NVIDIAs Compute Unified Device Architecture (CUDA).
Parallel fast multipole boundary element method applied to computational homogenization
NASA Astrophysics Data System (ADS)
Ptaszny, Jacek
2018-01-01
In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
Update on Development of Mesh Generation Algorithms in MeshKit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, Rajeev; Vanderzee, Evan; Mahadevan, Vijay
2015-09-30
MeshKit uses a graph-based design for coding all its meshing algorithms, which includes the Reactor Geometry (and mesh) Generation (RGG) algorithms. This report highlights the developmental updates of all the algorithms, results and future work. Parallel versions of algorithms, documentation and performance results are reported. RGG GUI design was updated to incorporate new features requested by the users; boundary layer generation and parallel RGG support were added to the GUI. Key contributions to the release, upgrade and maintenance of other SIGMA1 libraries (CGM and MOAB) were made. Several fundamental meshing algorithms for creating a robust parallel meshing pipeline in MeshKitmore » are under development. Results and current status of automated, open-source and high quality nuclear reactor assembly mesh generation algorithms such as trimesher, quadmesher, interval matching and multi-sweeper are reported.« less
Cavity-photon contribution to the effective interaction of electrons in parallel quantum dots
NASA Astrophysics Data System (ADS)
Gudmundsson, Vidar; Sitek, Anna; Abdullah, Nzar Rauf; Tang, Chi-Shung; Manolescu, Andrei
2016-05-01
A single cavity photon mode is expected to modify the Coulomb interaction of an electron system in the cavity. Here we investigate this phenomena in a parallel double quantum dot system. We explore properties of the closed system and the system after it has been opened up for electron transport. We show how results for both cases support the idea that the effective electron-electron interaction becomes more repulsive in the presence of a cavity photon field. This can be understood in terms of the cavity photons dressing the polarization terms in the effective mutual electron interaction leading to nontrivial delocalization or polarization of the charge in the double parallel dot potential. In addition, we find that the effective repulsion of the electrons can be reduced by quadrupolar collective oscillations excited by an external classical dipole electric field.
NASA Astrophysics Data System (ADS)
Ukar, Estibalitz; Lopez, Ramiro G.; Gale, Julia F. W.; Laubach, Stephen E.; Manceda, Rene
2017-11-01
In the Late Jurassic-Early Cretaceous Vaca Muerta Formation, previously unrecognized yet abundant structures constituting a new category of kinematic indicator occur within bed-parallel fibrous calcite veins (BPVs) in shale. Domal shapes result from localized shortening and thickening of BPVs and the intercalation of centimeter-thick, host-rock shale inclusions within fibrous calcite beef, forming thrust fault-bounded pop-up structures. Ellipsoidal and rounded structures show consistent orientations, lineaments of interlayered shale and fibrous calcite, and local centimeter-scale offset thrust faults that at least in some cases cut across the median line of the BPV and indicate E-W shortening. Continuity of crystal fibers shows the domal structures are contemporaneous with BPV formation and help establish timing of fibrous vein growth in the Late Cretaceous, when shortening directions were oriented E-W. Differences in the number of opening stages and the deformational style of the different BPVs indicate they may have opened at different times. The new domal kinematic indicators described in this study are small enough to be captured in core. When present in the subsurface, domal structures can be used to either infer paleostress orientation during the formation of BPVs or to orient core in cases where the paleostress is independently known.
Improved treatment of exact exchange in Quantum ESPRESSO
Barnes, Taylor A.; Kurth, Thorsten; Carrier, Pierre; ...
2017-01-18
Here, we present an algorithm and implementation for the parallel computation of exact exchange in Quantum ESPRESSO (QE) that exhibits greatly improved strong scaling. QE is an open-source software package for electronic structure calculations using plane wave density functional theory, and supports the use of local, semi-local, and hybrid DFT functionals. Wider application of hybrid functionals is desirable for the improved simulation of electronic band energy alignments and thermodynamic properties, but the computational complexity of evaluating the exact exchange potential limits the practical application of hybrid functionals to large systems and requires efficient implementations. We demonstrate that existing implementations ofmore » hybrid DFT that utilize a single data structure for both the local and exact exchange regions of the code are significantly limited in the degree of parallelization achievable. We present a band-pair parallelization approach, in which the calculation of exact exchange is parallelized and evaluated independently from the parallelization of the remainder of the calculation, with the wavefunction data being efficiently transformed on-the-fly into a form that is optimal for each part of the calculation. For a 64 water molecule supercell, our new algorithm reduces the overall time to solution by nearly an order of magnitude.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnes, Taylor A.; Kurth, Thorsten; Carrier, Pierre
Here, we present an algorithm and implementation for the parallel computation of exact exchange in Quantum ESPRESSO (QE) that exhibits greatly improved strong scaling. QE is an open-source software package for electronic structure calculations using plane wave density functional theory, and supports the use of local, semi-local, and hybrid DFT functionals. Wider application of hybrid functionals is desirable for the improved simulation of electronic band energy alignments and thermodynamic properties, but the computational complexity of evaluating the exact exchange potential limits the practical application of hybrid functionals to large systems and requires efficient implementations. We demonstrate that existing implementations ofmore » hybrid DFT that utilize a single data structure for both the local and exact exchange regions of the code are significantly limited in the degree of parallelization achievable. We present a band-pair parallelization approach, in which the calculation of exact exchange is parallelized and evaluated independently from the parallelization of the remainder of the calculation, with the wavefunction data being efficiently transformed on-the-fly into a form that is optimal for each part of the calculation. For a 64 water molecule supercell, our new algorithm reduces the overall time to solution by nearly an order of magnitude.« less
Parallel heuristics for scalable community detection
Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth
2015-08-14
Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Solar wind interaction with Venus and Mars in a parallel hybrid code
NASA Astrophysics Data System (ADS)
Jarvinen, Riku; Sandroos, Arto
2013-04-01
We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
Borius, Pierre-Yves; Garnier, Stéphanie Ranque; Baumstarck, Karine; Castinetti, Frédéric; Donnet, Anne; Guedj, Eric; Cornu, Philippe; Blond, Serge; Salas, Sébastien; Régis, Jean
2017-08-02
Hypophysectomy performed by craniotomy or percutaneous techniques leads to complete pain relief in more than 70% to 80% of cases for opioid refractory cancer pain. Radiosurgery could be an interesting alternative approach to reduce complications. To assess the analgesic efficacy compared with standard of care is the primary goal. The secondary objectives are to assess ophthalmic and endocrine tolerance, drug consumption, quality of life, and mechanisms of analgesic action. The trial is multicenter, randomized, prospective, and open-label with 2 parallel groups. This concerns patients in palliative care suffering from nociceptive or mixed cancer pain, refractory to standard opioid therapy. Participants will be randomly assigned to the control group receiving standards of care for pain according to recommendations, or to the experimental group receiving a pituitary GammaKnife (Elekta, Stockholm, Sweden) radiosurgery (160 Gy delivered in pituitary gland) associated with standards of care. Evaluation assessments will be taken at baseline, day0, day4, day7, day14, day28, day45, month3, and month6. We could expect pain improvement in 70% to 90% of cases at day4. In addition we will assess the safety of pituitary radiosurgery in a vulnerable population. The secondary endpoints could show decay of opioid consumption, good patient satisfaction, and improvement of the quality of life. The design of this study is potentially the most appropriate to demonstrate the efficacy and safety of radiosurgery for this new indication. New recommendations could be obtained in order to improve pain relief and quality of life. Copyright © 2017 by the Congress of Neurological Surgeons
14 CFR 25.810 - Emergency egress assist means and escape routes.
Code of Federal Regulations, 2014 CFR
2014-01-01
... TRANSPORTATION AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Emergency... carrying simultaneously two parallel lines of evacuees. In addition, the assisting means must be designed... during the interval between the time the exit opening means is actuated from inside the airplane and the...
14 CFR 25.810 - Emergency egress assist means and escape routes.
Code of Federal Regulations, 2012 CFR
2012-01-01
... TRANSPORTATION AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Emergency... carrying simultaneously two parallel lines of evacuees. In addition, the assisting means must be designed... during the interval between the time the exit opening means is actuated from inside the airplane and the...
14 CFR 25.810 - Emergency egress assist means and escape routes.
Code of Federal Regulations, 2011 CFR
2011-01-01
... TRANSPORTATION AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Emergency... carrying simultaneously two parallel lines of evacuees. In addition, the assisting means must be designed... during the interval between the time the exit opening means is actuated from inside the airplane and the...
14 CFR 25.810 - Emergency egress assist means and escape routes.
Code of Federal Regulations, 2010 CFR
2010-01-01
... TRANSPORTATION AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Emergency... carrying simultaneously two parallel lines of evacuees. In addition, the assisting means must be designed... during the interval between the time the exit opening means is actuated from inside the airplane and the...
14 CFR 25.810 - Emergency egress assist means and escape routes.
Code of Federal Regulations, 2013 CFR
2013-01-01
... TRANSPORTATION AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Emergency... carrying simultaneously two parallel lines of evacuees. In addition, the assisting means must be designed... during the interval between the time the exit opening means is actuated from inside the airplane and the...
Personal Construct Theory and Systemic Therapies: Parallel or Convergent Trends?
ERIC Educational Resources Information Center
Feixas, Guillem
1990-01-01
Explores similarities between Kelly's Personal Construct Theory (PCT) and systemic therapies. Asserts that (1) PCT and systemic therapies share common epistemological stance, constructivism; (2) personal construct systems possess properties of open systems; and (3) PCT and systemic therapies hold similar positions on relevant theoretical and…
Code of Federal Regulations, 2011 CFR
2011-01-01
... which there is recurrent movement, which is usually indicated by small, periodic displacements or... of fluids, expressed as the ratio of the volume of interconnected pores and openings to the volume of... displacement of the side relative to one another parallel to the fracture or zone of fractures. Faulting means...
Code of Federal Regulations, 2012 CFR
2012-01-01
... which there is recurrent movement, which is usually indicated by small, periodic displacements or... of fluids, expressed as the ratio of the volume of interconnected pores and openings to the volume of... displacement of the side relative to one another parallel to the fracture or zone of fractures. Faulting means...
Code of Federal Regulations, 2013 CFR
2013-01-01
... which there is recurrent movement, which is usually indicated by small, periodic displacements or... of fluids, expressed as the ratio of the volume of interconnected pores and openings to the volume of... displacement of the side relative to one another parallel to the fracture or zone of fractures. Faulting means...
Code of Federal Regulations, 2014 CFR
2014-01-01
... which there is recurrent movement, which is usually indicated by small, periodic displacements or... of fluids, expressed as the ratio of the volume of interconnected pores and openings to the volume of... displacement of the side relative to one another parallel to the fracture or zone of fractures. Faulting means...
Variability of hydrologic regimes and morphology in constructed open-ditch channels
Strock, J.S.; Magner, J.A.; Richardson, W.B.; Sadowsky, M.J.; Sands, G.R.; Venterea, R.T.; ,
2004-01-01
Open-ditch ecosystems are potential transporters of considerable loads of nutrients, sediment, pathogens and pesticides from direct inflow from agricultural land to small streams and larger rivers. Our objective was to compare hydrology and channel morphology between two experimental open-ditch channels. An open-ditch research facility incorporating a paired design was constructed during 2002 near Lamberton, MN. A200-m reach of existing drainage channel was converted into a system of four parallel channels. The facility was equipped with water level control devices and instrumentation for flow monitoring and water sample collection on upstream and downstream ends of the system. Hydrographs from simulated flow during year one indicated that paired open-ditch channels responded similarly to changes in inflow. Variability in hydrologic response between open-ditches was attributed to differences in open-ditch channel bottom elevation and vegetation density. No chemical, biological, or atmospheric measurements were made during 2003. Potential future benefits of this research include improved biological diversity and integrity of open-ditch ecosystems, reduce flood peaks and increased flow during critical low-flow periods, improved and more efficient nitrogen retention within the open-ditch ecosystem, and decreased maintenance cost associated with reduced frequency of open-ditch maintenance.
NASA Technical Reports Server (NTRS)
Noh, H. M.; Pathak, P. H.
1986-01-01
An approximate but sufficiently accurate high frequency solution which combines the uniform geometrical theory of diffraction (UTD) and the aperture integration (AI) method is developed for analyzing the problem of electromagnetic (EM) plane wave scattering by an open-ended, perfectly-conducting, semi-infinite hollow rectangular waveguide (or duct) with a thin, uniform layer of lossy or absorbing material on its inner wall, and with a planar termination inside. In addition, a high frequency solution for the EM scattering by a two dimensional (2-D), semi-infinite parallel plate waveguide with a absorber coating on the inner walls is also developed as a first step before analyzing the open-ended semi-infinite three dimensional (3-D) rectangular waveguide geometry. The total field scattered by the semi-infinite waveguide consists firstly of the fields scattered from the edges of the aperture at the open-end, and secondly of the fields which are coupled into the waveguide from the open-end and then reflected back from the interior termination to radiate out of the open-end. The first contribution to the scattered field can be found directly via the UTD ray method. The second contribution is found via the AI method which employs rays to describe the fields in the aperture that arrive there after reflecting from the interior termination. It is assumed that the direction of the incident plane wave and the direction of observation lie well inside the forward half space tht exists outside the half space containing the semi-infinite waveguide geometry. Also, the medium exterior to the waveguide is assumed to be free space.
Pteros: fast and easy to use open-source C++ library for molecular analysis.
Yesylevskyy, Semen O
2012-07-15
An open-source Pteros library for molecular modeling and analysis of molecular dynamics trajectories for C++ programming language is introduced. Pteros provides a number of routine analysis operations ranging from reading and writing trajectory files and geometry transformations to structural alignment and computation of nonbonded interaction energies. The library features asynchronous trajectory reading and parallel execution of several analysis routines, which greatly simplifies development of computationally intensive trajectory analysis algorithms. Pteros programming interface is very simple and intuitive while the source code is well documented and easily extendible. Pteros is available for free under open-source Artistic License from http://sourceforge.net/projects/pteros/. Copyright © 2012 Wiley Periodicals, Inc.
Mathematical Abstraction: Constructing Concept of Parallel Coordinates
NASA Astrophysics Data System (ADS)
Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.
2017-09-01
Mathematical abstraction is an important process in teaching and learning mathematics so pre-service mathematics teachers need to understand and experience this process. One of the theoretical-methodological frameworks for studying this process is Abstraction in Context (AiC). Based on this framework, abstraction process comprises of observable epistemic actions, Recognition, Building-With, Construction, and Consolidation called as RBC + C model. This study investigates and analyzes how pre-service mathematics teachers constructed and consolidated concept of Parallel Coordinates in a group discussion. It uses AiC framework for analyzing mathematical abstraction of a group of pre-service teachers consisted of four students in learning Parallel Coordinates concepts. The data were collected through video recording, students’ worksheet, test, and field notes. The result shows that the students’ prior knowledge related to concept of the Cartesian coordinate has significant role in the process of constructing Parallel Coordinates concept as a new knowledge. The consolidation process is influenced by the social interaction between group members. The abstraction process taken place in this group were dominated by empirical abstraction that emphasizes on the aspect of identifying characteristic of manipulated or imagined object during the process of recognizing and building-with.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Eke, F U; Obamyonyi, A; Eke, N N; Oyewo, E A
2000-02-01
We compared the efficacy and tolerability of oral piroxicam 1 mg/kg/day with soluble aspirin given at 100 mg/kg/day taken four-hourly in 58 patients with sickle cell anaemia and severe ostcoarticular painful attacks requiring hospitalization in a randomized, paralleled study. Main investigational criteria were pain relief, limitation of movement, fever, and insomnia or agitation. Both groups were well-matched at the commencement of therapy but most patients on piroxicam showed remarkable and significant pain relief and improvement in other parameters within 24 h. Unwanted effects were absent in the piroxicam-treated group whereas those treated with aspirin experienced nausea and vomiting. There were no significant changes in liver function tests with both forms of treatment. Oral piroxicam is an effective and safe treatment in the management of the osteoarticular painful crisis in sickle cell anaemia. It might prevent the use of parenteral analgesics and hospitalization and reduce the loss of school hours in patients who are being treated for bone pain crises that characterize sickle cell anaemia.
Zhong, Lei; Wang, Dengqiang; Gan, Xiaoni; Yang, Tong; He, Shunping
2011-01-01
Group B of the Sox transcription factor family is crucial in embryo development in the insects and vertebrates. Sox group B, unlike the other Sox groups, has an unusually enlarged functional repertoire in insects, but the timing and mechanism of the expansion of this group were unclear. We collected and analyzed data for Sox group B from 36 species of 12 phyla representing the major metazoan clades, with an emphasis on arthropods, to reconstruct the evolutionary history of SoxB in bilaterians and to date the expansion of Sox group B in insects. We found that the genome of the bilaterian last common ancestor probably contained one SoxB1 and one SoxB2 gene only and that tandem duplications of SoxB2 occurred before the arthropod diversification but after the arthropod-nematode divergence, resulting in the basal repertoire of Sox group B in diverse arthropod lineages. The arthropod Sox group B repertoire expanded differently from the vertebrate repertoire, which resulted from genome duplications. The parallel increases in the Sox group B repertoires of the arthropods and vertebrates are consistent with the parallel increases in the complexity and diversification of these two important organismal groups. PMID:21305035
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Large-scale virtual screening on public cloud resources with Apache Spark.
Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola
2017-01-01
Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
Plenty, Stephanie; Bejerot, Susanne
2014-01-01
Although adults with autism spectrum disorder are an increasingly identified patient population, few treatment options are available. This preliminary randomized controlled open trial with a parallel design developed two group interventions for adults with autism spectrum disorders and intelligence within the normal range: cognitive behavioural therapy and recreational activity. Both interventions comprised 36 weekly 3-h sessions led by two therapists in groups of 6–8 patients. A total of 68 psychiatric patients with autism spectrum disorders participated in the study. Outcome measures were Quality of Life Inventory, Sense of Coherence Scale, Rosenberg Self-Esteem Scale and an exploratory analysis on measures of psychiatric health. Participants in both treatment conditions reported an increased quality of life at post-treatment (d = 0.39, p < 0.001), with no difference between interventions. No amelioration of psychiatric symptoms was observed. The dropout rate was lower with cognitive behavioural therapy than with recreational activity, and participants in cognitive behavioural therapy rated themselves as more generally improved, as well as more improved regarding expression of needs and understanding of difficulties. Both interventions appear to be promising treatment options for adults with autism spectrum disorder. The interventions’ similar efficacy may be due to the common elements, structure and group setting. Cognitive behavioural therapy may be additionally beneficial in terms of increasing specific skills and minimizing dropout. PMID:24089423
Hesselmark, Eva; Plenty, Stephanie; Bejerot, Susanne
2014-08-01
Although adults with autism spectrum disorder are an increasingly identified patient population, few treatment options are available. This preliminary randomized controlled open trial with a parallel design developed two group interventions for adults with autism spectrum disorders and intelligence within the normal range: cognitive behavioural therapy and recreational activity. Both interventions comprised 36 weekly 3-h sessions led by two therapists in groups of 6-8 patients. A total of 68 psychiatric patients with autism spectrum disorders participated in the study. Outcome measures were Quality of Life Inventory, Sense of Coherence Scale, Rosenberg Self-Esteem Scale and an exploratory analysis on measures of psychiatric health. Participants in both treatment conditions reported an increased quality of life at post-treatment (d = 0.39, p < 0.001), with no difference between interventions. No amelioration of psychiatric symptoms was observed. The dropout rate was lower with cognitive behavioural therapy than with recreational activity, and participants in cognitive behavioural therapy rated themselves as more generally improved, as well as more improved regarding expression of needs and understanding of difficulties. Both interventions appear to be promising treatment options for adults with autism spectrum disorder. The interventions' similar efficacy may be due to the common elements, structure and group setting. Cognitive behavioural therapy may be additionally beneficial in terms of increasing specific skills and minimizing dropout. © The Author(s) 2013.
Wan Seman, W J; Kori, N; Rajoo, S; Othman, H; Mohd Noor, N; Wahab, N A; Sukor, N; Mustafa, N; Kamaruddin, N A
2016-06-01
The aim of the present study was to assess the hypoglycaemia risk and safety of dapagliflozin compared with sulphonylurea during the fasting month of Ramadan. In this 12-week, randomized, open-label, two-arm parallel group study, 110 patients with type 2 diabetes who were receiving sulphonylurea and metformin were randomized either to receive 10 mg (n = 58) of dapagliflozin daily or to continue receiving sulphonylurea (n = 52). The primary outcome was to compare the effects of dapagliflozin and sulphonylurea on the proportions of patients with at least one episode of hypoglycaemia during Ramadan, as well as to assess the safety of dapagliflozin when used to treat patients observing Ramadan. A lower proportion of patients had reported or documented hypoglycaemia in the dapagliflozin group than in the sulphonylurea group: 4 (6.9%) versus 15 (28.8%); p = 0.002. The relative risk of any reported or documented hypoglycaemia in the 4th week of Ramadan was significantly lower in the dapagliflozin group: RR=0.24, 95%CI: 0.09, 0.68; p=0.002. No significance differences were observed between the two groups regarding postural hypotension (13.8 vs 3.8%; p = 0.210) or urinary tract infections (10.3 vs 3.8%; p = 0.277). In conclusion, fewer patients exhibited hypoglycaemia in the dapagliflozin group than in the sulphonylurea group. © 2016 John Wiley & Sons Ltd.
Wang, F; Fan, Q X; Wang, H H; Han, D M; Song, N S; Lu, H
2017-06-23
Objective: To evaluate the efficacy and safety of Xiaoaiping combined with chemotherapy in the treatment of advanced esophageal cancer. Methods: This is a multi-center, randomized, open label and parallel controlled study. A total of 124 advanced esophageal cancer patients with Karnofsky Performance Status (KPS) score ≥60 and expected survival time≥3 months were enrolled. We adopted design and divided the patients into study and control group. The patients in study group received Xiaoaiping combined with S-1 and cisplatin. The control group received S-1 and cisplatin. Each group included 62 patients and 21 days as a treatment cycle. The efficacy and adverse events in patients of the two groups were observed and compared. Results: 57 patients in the study group and 55 in the control group were included in efficacy assessment. The response rate was 54.4% and 34.5% in the study group and control group, respectively( P <0.05). Disease control rates were 86.0% and 69.1%, respectively( P <0.05). The median progression-free survival (PFS) was 7.97 in the study group and 6.43 months in the control group( P <0.05). The median overall survival(OS) was 12.93 in the study group and 10.93 months in the control group( P <0.05). The most common adverse events in the two groups were nausea and vomiting, thrombocytopenia, anemia, neutropenia, liver damage, pigmentation, oral mucositis, renal impairment and diarrhea. The incidences of nausea, vomiting, thrombocytopenia, leukopenia, neutropenia and diarrhea in the study group were significantly higher than those in the control group( P <0.05). Conclusion: Xiaoaiping combined with S-1 and cisplatin significantly increased response rate, and prolongedpatients' survival in patients with advanced esophageal cancer.
On a model of three-dimensional bursting and its parallel implementation
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.
2008-04-01
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.
Ruymgaart, A Peter; Elber, Ron
2012-11-13
We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
Berton, Linda; Bano, Giulia; Carraro, Sara; Veronese, Nicola; Pizzato, Simona; Bolzetta, Francesco; De Rui, Marina; Valmorbida, Elena; De Ronch, Irene; Perissinotto, Egle; Coin, Alessandra; Manzato, Enzo; Sergi, Giuseppe
2015-01-01
Although older people are particularly liable to sarcopenia, limited research is available on beta-hydroxy-beta-methylbutyrate (HMB) supplementation in this population, particularly in healthy subjects. In this parallel-group, randomized, controlled, open-label trial, we aimed to evaluate whether an oral supplement containing 1.5 g of calcium HMB for 8 weeks could improve physical performance and muscle strength parameters in a group of community-dwelling healthy older women. Eighty healthy women attending a twice-weekly mild fitness program were divided into two equal groups of 40, and 32 of the treated women and 33 control completed the study. We considered a change in the Short Physical Performance Battery (SPPB) score as the primary outcome and changes in the peak torque (PT) isometric and isokinetic strength of the lower limbs, 6-minute walking test (6MWT), handgrip strength and endurance as secondary outcomes. Body composition was assessed with dual-energy X-ray absorptiometry (DXA) and peripheral quantitative computerized tomography (pQCT). The mean difference between the two groups on pre-post change were finally calculated (delta) for each outcome. After 8 weeks, there were no significant differences between the groups’ SPPB, handgrip strength or DXA parameters. The group treated with HMB scored significantly better than the control group for PT isokinetic flexion (delta = 1.56±1.56 Nm; p = 0.03) and extension (delta = 3.32±2.61 Nm; p = 0.03), PT isometric strength (delta = 9.74±3.90 Nm; p = 0.02), 6MWT (delta = 7.67±8.29 m; p = 0.04), handgrip endurance (delta = 21.41±16.28 s; p = 0.02), and muscle density assessed with pQCT. No serious adverse effects were reported in either group. In conclusion, a nutritional supplement containing 1.5 g of calcium HMB for 8 weeks in healthy elderly women had no significant effects on SPPB, but did significantly improve several muscle strength and physical performance parameters. ClinicalTrials.gov NCT02118181.
Establishing a group of endpoints in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.; Xue, Hanhong
2016-02-02
A parallel computer executes a number of tasks, each task includes a number of endpoints and the endpoints are configured to support collective operations. In such a parallel computer, establishing a group of endpoints receiving a user specification of a set of endpoints included in a global collection of endpoints, where the user specification defines the set in accordance with a predefined virtual representation of the endpoints, the predefined virtual representation is a data structure setting forth an organization of tasks and endpoints included in the global collection of endpoints and the user specification defines the set of endpoints without a user specification of a particular endpoint; and defining a group of endpoints in dependence upon the predefined virtual representation of the endpoints and the user specification.
NASA Astrophysics Data System (ADS)
Wendel, D. E.; Olson, D. K.; Hesse, M.; Karimabadi, H.; Daughton, W. S.
2013-12-01
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection of a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of topological features such as separators and null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a correspondence between the locus of changes in magnetic connectivity, or the quasi-separatrix layer, and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we compare the distribution of parallel electric fields along field lines with the reconnection rate. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field, while the contribution from high amplitude parallel fluctuations, such as electron holes, is negligible. The results impact the determination of reconnection sites within models of 3D turbulent reconnection as well as the inference of reconnection rates from in situ spacecraft measurements. It is difficult through direct observation to isolate the locus of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the partial sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Subhas, S; Rupesh, P L; Devanna, R; Kumar, D R V; Paliwal, A; Solanki, P
2017-04-01
The aim of the study is to compare the relationship of the occlusal plane to 3 different ala-tragal lines, namely the superior, middle and inferior lines, in individuals having different head forms and its relation to the Frankfort horizontal plane. A total of 75 lateral cephalometric radiographs of subjects with natural dentition, having full complement of teeth, between the age group of 18-25 were screened and selected. Lateral cephalogram were made for each subjects in an open mouth position. Prior to making the lateral cephalogram, radiopaque markers were placed on the superior, middle and inferior tragus points and on the inferior border of the ala of the nose. Cephalometric tracing was done over each cephalogram. In mesiocephalic head form the middle ala-tragal line was most parallel to the occlusal plane having a mean angle of (1.96°). In dolichocephalic headform, the superior ala-tragal line was most parallel to the occlusal plane having a mean angle of (0.48°). In brachycephalic head form, the middle ala-tragal line was most parallel to the occlusal plane having a mean angle of (2.08°). The mean angulations of occlusal plane to FH plane is 11.04°, 10.16° and 10.60° in mesiocephalic, dolichocephalic and brachycephalic head forms, respectively. The study concludes that the middle ala-tragal line can be used as a reference for the mesiocephalic head form and the superior ala-tragal line for the dolichocephalic and brachycephalic head form as a reference to establish the occlusal plane. Copyright © 2016. Published by Elsevier Masson SAS.
Infection Rates in Arthroscopic Versus Open Rotator Cuff Repair.
Hughes, Jonathan D; Hughes, Jessica L; Bartley, Justin H; Hamilton, William P; Brennan, Kindyle L
2017-07-01
The prevalence of rotator cuff repair continues to rise, with a noted transition from open to arthroscopic techniques in recent years. One reported advantage of arthroscopic repair is a lower infection rate. However, to date, the infection rates of these 2 techniques have not been directly compared with large samples at a single institution with fully integrated medical records. To retrospectively compare postoperative infection rates between arthroscopic and open rotator cuff repair. Cohort study; Level of evidence, 3. From January 2003 until May 2011, a total of 1556 patients underwent rotator cuff repair at a single institution. These patients were divided into an arthroscopic repair group and an open group. A Pearson chi-square test and Fisher exact test were used, with a subgroup analysis to segment the open repair group into mini-open and open procedures. The odds ratio and 95% CI of developing a postoperative infection was calculated for the 2 groups. A multiple-regressions model was then utilized to identify predictors of the presence of infection. Infection was defined as only those treated with surgical intervention, thus excluding superficial infections treated with antibiotics alone. A total of 903 patients had an arthroscopic repair, while 653 had open repairs (600 mini-open, 53 open). There were 4 confirmed infections in the arthroscopic group and 16 in the open group (15 mini-open, 1 open), resulting in postoperative infection rates of 0.44% and 2.45%, respectively. Subgroup analysis of the mini-open and open groups demonstrated a postoperative infection rate of 2.50% and 1.89%, respectively. The open group had an odds ratio of 5.645 (95% CI, 1.9-17.0) to develop a postoperative infection compared with the arthroscopic group. Patients undergoing open rotator cuff repair had a significantly higher rate of postoperative infection compared with those undergoing arthroscopic rotator cuff repair.
Li, Wenke; Wayne, Gregory S; Lallaman, John E; Chang, Sou-Jen; Wittenberger, Steven J
2006-02-17
Ketoester 1 is cyclized to give pyran-3,5-dione 2 in 78% yield using a parallel addition of ketoester 1 and base NaO(t)Bu in refluxing THF. Compared to the previously reported procedures, these optimized conditions have significantly increased the yield of this transformation and the quality of pyran 2 and prove to be suitable for large-scale preparation. An application of 2 to the synthesis of ABT-598, a potassium channel opener, is demonstrated.
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor
NASA Astrophysics Data System (ADS)
Hristov, Ivan; Goranov, Goran; Hristova, Radoslava
2018-02-01
We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
Rehfeldt, Ruth Anne; Jung, Heidi L; Aguirre, Angelica; Nichols, Jane L; Root, William B
2016-03-01
The e-Transformation in higher education, in which Massive Open Online Courses (MOOCs) are playing a pivotal role, has had an impact on the modality in which behavior analysis is taught. In this paper, we survey the history and implications of online education including MOOCs and describe the implementation and results for the discipline's first MOOC, delivered at Southern Illinois University in spring 2015. Implications for the globalization and free access of higher education are discussed, as well as the parallel between MOOCs and Skinner's teaching machines.
Jdpd: an open java simulation kernel for molecular fragment dissipative particle dynamics.
van den Broek, Karina; Kuhn, Hubert; Zielesny, Achim
2018-05-21
Jdpd is an open Java simulation kernel for Molecular Fragment Dissipative Particle Dynamics with parallelizable force calculation, efficient caching options and fast property calculations. It is characterized by an interface and factory-pattern driven design for simple code changes and may help to avoid problems of polyglot programming. Detailed input/output communication, parallelization and process control as well as internal logging capabilities for debugging purposes are supported. The new kernel may be utilized in different simulation environments ranging from flexible scripting solutions up to fully integrated "all-in-one" simulation systems.
Chen, Yingying; Liu, Peng; Chen, Xia; Li, Yanan; Zhang, Fengmei; Wang, Yangang
2018-05-01
There is a lack of research on the effect of low dose of angiotensin receptor blockers combined with spironolactone, and the effect of high dose of angiotensin receptor blockers alone on the urinary albumin excretion rate (UAER) in elderly patients with early type 2 diabetic nephropathy (DN). We conducted a prospective, randomized, open-label, parallel-controlled study that included 244 elderly patients with early DN and mild-to-moderate essential hypertension. Patients were randomly divided into 4 groups: low-dose irbesartan (group A), high-dose irbesartan (group B), low-dose irbesartan combined with spironolactone (group C) and high-dose irbesartan combined with spironolactone (group D). Changes in UAER, serum potassium and blood pressure were compared. There were no statistical differences in the baseline characteristics among groups. Furthermore, no significant difference in blood pressure before and after treatment was found among different groups. After 72-week treatment, UAER in group D was lower compared to group A and B (P < 0.05). Meanwhile, compared with group B, UAER in group C decreased significantly (P < 0.05). Additionally, significantly higher serum potassium was found in group D compared to other groups (P < 0.05). Also, group D had the highest count of patients who withdrew from the study due to hyperkalemia compared to other groups (P < 0.05). Our results indicate high-dose irbesartan combined with spironolactone may be more efficient in reducing UAER in elderly patients with early DN, but this treatment could cause hyperkalemia. Low-dose irbesartan combined with spironolactone was shown to be safer and more effective in decreasing UAER compared to high-dose irbesartan. Copyright © 2018 Southern Society for Clinical Investigation. Published by Elsevier Inc. All rights reserved.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.
Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal
2015-06-01
Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Good Questions: Great Ways to Differentiate Mathematics Instruction
ERIC Educational Resources Information Center
Small, Marian
2009-01-01
Using differentiated instruction in the classroom can be a challenge, especially when teaching mathematics. This book cuts through the difficulties with two powerful and universal strategies that teachers can use across all math content: Open Questions and Parallel Tasks. Specific strategies and examples for grades Kindergarten - 8 are organized…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bull, Jeffrey S.
This presentation describes how to build MCNP 6.2. MCNP®* 6.2 can be compiled on Macs, PCs, and most Linux systems. It can also be built for parallel execution using both OpenMP and Messing Passing Interface (MPI) methods. MCNP6 requires Fortran, C, and C++ compilers to build the code.
Current Developments in DETER Cybersecurity Testbed Technology
2015-12-08
management on PlanetLab [12], such as Plush and Nebula [4], PlMan [19], Stork [20], pShell [21], Planetlab Application Manager [22], parallel open...SSH tools [23], plDist [24], Nixes [25], PLDeploy [26] and vxargs [27]. With the exception of Plush and Nebula , these tools are all low-level
The Role of Attention in Subitizing
ERIC Educational Resources Information Center
Railo, Henry; Koivisto, Mika; Revonsuo, Antti; Hannula, Minna M.
2008-01-01
The process of rapidly and accurately enumerating small numbers of items without counting, i.e. subitizing, is often believed to rest on parallel preattentive processes. However, the possibility that enumeration of small numbers of items would also require attentional processes has remained an open question. The present study is the first that…
49 CFR 238.123 - Emergency roof access.
Code of Federal Regulations, 2011 CFR
2011-10-01
... minimum opening of 26 inches longitudinally (i.e., parallel to the longitudinal axis of the car) by 24... be free of any rigid secondary structure (e.g., a diffuser or diffuser support, lighting back fixture... means of a structural weak point, it shall be permissible to cut through interior panels, liners, or...
Being Outside Learning about Science Is Amazing: A Mixed Methods Study
ERIC Educational Resources Information Center
Weibel, Michelle L.
2011-01-01
This study used a convergent parallel mixed methods design to examine teachers' environmental attitudes and concerns about an outdoor educational field trip. Converging both quantitative data (Environmental Attitudes Scale and teacher demographics) and qualitative data (Open-Ended Statements of Concern and interviews) facilitated interpretation.…
46 CFR 112.05-3 - Main-emergency bus-tie.
Code of Federal Regulations, 2010 CFR
2010-10-01
... emergency switchboard; (b) Be arranged to prevent parallel operation of an emergency power source with any other source of electric power, except for interlock systems for momentary transfer of loads; and (c) If arranged for feedback operation, open automatically upon overload of the emergency power source before the...
Accessing and visualizing scientific spatiotemporal data
NASA Technical Reports Server (NTRS)
Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;
2004-01-01
This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.
Testing for carryover effects after cessation of treatments: a design approach.
Sturdevant, S Gwynn; Lumley, Thomas
2016-08-02
Recently, trials addressing noisy measurements with diagnosis occurring by exceeding thresholds (such as diabetes and hypertension) have been published which attempt to measure carryover - the impact that treatment has on an outcome after cessation. The design of these trials has been criticised and simulations have been conducted which suggest that the parallel-designs used are not adequate to test this hypothesis; two solutions are that either a differing parallel-design or a cross-over design could allow for diagnosis of carryover. We undertook a systematic simulation study to determine the ability of a cross-over or a parallel-group trial design to detect carryover effects on incident hypertension in a population with prehypertension. We simulated blood pressure and focused on varying criteria to diagnose systolic hypertension. Using the difference in cumulative incidence hypertension to analyse parallel-group or cross-over trials resulted in none of the designs having acceptable Type I error rate. Under the null hypothesis of no carryover the difference is well above the nominal 5 % error rate. When a treatment is effective during the intervention period, reliable testing for a carryover effect is difficult. Neither parallel-group nor cross-over designs using the difference in cumulative incidence appear to be a feasible approach. Future trials should ensure their design and analysis is validated by simulation.
Barth, Johannes; Boutsiadis, Achilleas; Neyton, Lionel; Lafosse, Laurent; Walch, Gilles
2017-10-01
One of the factors that can affect the success of the Latarjet procedure is accurate coracoid graft (CG) placement. The use of a guide can improve placement of the CG and screw positioning in the sagittal and axial planes as compared with the classic open ("freehand") technique. Cohort study; Level of evidence, 2. A total of 49 patients who underwent a Latarjet procedure for the treatment of recurrent anterior shoulder instability were prospectively included; the procedure was performed with the freehand technique in 22 patients (group 1) and with use of a parallel drill guide during screw placement in 27 patients (group 2). All patients underwent a postoperative computed tomography scan with the same established protocol. The scans were used to evaluate and compare the position of the CG in the sagittal and axial planes, the direction of the screws (α angle), and overall contact of the graft with the anterior surface of the glenoid after the 2 surgical techniques. The CG was placed >60% below the native glenoid equator in 23 patients (85.2%) in group 2, compared with 14 patients (63.6%) in group 1 ( P = .004). In the axial plane, the position of the CG in group 2 patients was more accurate (85.2% and 88.9% flush) at the inferior and middle quartiles of the glenoid surface ( P = .012 and .009), respectively. Moreover, with the freehand technique (group 1), the graft was in a more lateral position in the inferior and middle quartiles ( P = .012 and .009, respectively). No differences were found between groups 1 and 2 regarding the mean α angle of the superior (9° ± 4.14° vs 11° ± 6.3°, P = .232) and inferior (9.5° ± 6° vs 10° ± 7.5°, P = .629) screws. However, the mean contact angle (angle between the posterior coracoid and the anterior glenoid surface) with the freehand technique (3.8° ± 6.8°) was better than that of the guide (8.55° ± 8°) ( P = .05). Compared with the classic freehand operative technique, the parallel drill guide can ensure more accurate placement of the CG in the axial and sagittal planes, although with inferior bone contact.
Open-ended and Open-door Treatment Groups for Young People with Mental Illness.
Miller, Rachel; Mason, Susan E
2012-01-01
The concept of open-ended groups is expanded to include an open-door model (OEOD) wherein members with severe mental illnesses, including schizophrenia disorders and bi-polar, can join, leave, and re-enter groups as their life circumstances dictate their availability and willingness for treatment. This model is grounded on the work of Schopler and Galinsky's (1984/2006) and Galinsky and Schopler's (1989) theses on the value and processes of open-ended groups and includes perspectives on mutual aid and group development. Groupwork with the OEOD format is illustrated with examples taken from a group of 79 participants diagnosed with first-episode schizophrenia/schizoaffective disorders, 40 of who had co-occurring substance abuse. Of the 79 participants in the OEOD group program, 70 (89%) remained in treatment for the maximum of 3 years. The over-all value of group treatment for this population is reviewed along with the small number of available publications on open-ended and open-door-type groups.
Open-ended and Open-door Treatment Groups for Young People with Mental Illness
MILLER, RACHEL; MASON, SUSAN E.
2012-01-01
The concept of open-ended groups is expanded to include an open-door model (OEOD) wherein members with severe mental illnesses, including schizophrenia disorders and bi-polar, can join, leave, and re-enter groups as their life circumstances dictate their availability and willingness for treatment. This model is grounded on the work of Schopler and Galinsky’s (1984/2006) and Galinsky and Schopler’s (1989) theses on the value and processes of open-ended groups and includes perspectives on mutual aid and group development. Groupwork with the OEOD format is illustrated with examples taken from a group of 79 participants diagnosed with first-episode schizophrenia/schizoaffective disorders, 40 of who had co-occurring substance abuse. Of the 79 participants in the OEOD group program, 70 (89%) remained in treatment for the maximum of 3 years. The over-all value of group treatment for this population is reviewed along with the small number of available publications on open-ended and open-door-type groups. PMID:22427713
Schopf, J. William; Kudryavtsev, Anatoliy B.; Walter, Malcolm R.; Van Kranendonk, Martin J.; Williford, Kenneth H.; Kozdon, Reinhard; Valley, John W.; Gallardo, Victor A.; Espinoza, Carola; Flannery, David T.
2015-01-01
The recent discovery of a deep-water sulfur-cycling microbial biota in the ∼2.3-Ga Western Australian Turee Creek Group opened a new window to life's early history. We now report a second such subseafloor-inhabiting community from the Western Australian ∼1.8-Ga Duck Creek Formation. Permineralized in cherts formed during and soon after the 2.4- to 2.2-Ga “Great Oxidation Event,” these two biotas may evidence an opportunistic response to the mid-Precambrian increase of environmental oxygen that resulted in increased production of metabolically useable sulfate and nitrate. The marked similarity of microbial morphology, habitat, and organization of these fossil communities to their modern counterparts documents exceptionally slow (hypobradytelic) change that, if paralleled by their molecular biology, would evidence extreme evolutionary stasis. PMID:25646436
[Cultural diversity and pluralism in the Universal Declaration on Bioethics and Human Rights].
Romeo Casabona, Carlos María
2011-01-01
The Universal Declaration on Bioethics and Human Rights represents a significant milestone in the history of Law, particularly in the application of International Law to an important area of human activity, namely the medical sciences, the life sciences and the technologies which, linked to both, can be applied to human relations. In parallel with this, and as will be analysed in this article, the Declaration has involved adopting a clear position regarding cultural diversity and pluralism in relation to Biomedicine. In this paper the author highlights the fact that perspectives have been opened which have hardly been explored concerning Biomedicine, such as the recognition of the value and respect which cultural diversity (multiculturalism), economic and social diversity deserve in relation to the issues covered by the Declaration, and the acceptance that the owners of the rights are not only individuals, but can also be groups.
Korf, Bruce; Ahmadian, Reza; Allanson, Judith; Aoki, Yoko; Bakker, Annette; Wright, Emma Burkitt; Denger, Brian; Elgersma, Ype; Gelb, Bruce D; Gripp, Karen W; Kerr, Bronwyn; Kontaridis, Maria; Lazaro, Conxi; Linardic, Corinne; Lozano, Reymundo; MacRae, Calum A; Messiaen, Ludwine; Mulero-Navarro, Sonia; Neel, Benjamin; Plotkin, Scott; Rauen, Katherine A; Roberts, Amy; Silva, Alcino J; Sittampalam, Sitta G; Zhang, Chao; Schoyer, Lisa
2015-08-01
"The Third International Meeting on Genetic Disorders in the RAS/MAPK Pathway: Towards a Therapeutic Approach" was held at the Renaissance Orlando at SeaWorld Hotel (August 2-4, 2013). Seventy-one physicians and scientists attended the meeting, and parallel meetings were held by patient advocacy groups (CFC International, Costello Syndrome Family Network, NF Network and Noonan Syndrome Foundation). Parent and patient advocates opened the meeting with a panel discussion to set the stage regarding their hopes and expectations for therapeutic advances. In keeping with the theme on therapeutic development, the sessions followed a progression from description of the phenotype and definition of therapeutic endpoints, to definition of genomic changes, to identification of therapeutic targets in the RAS/MAPK pathway, to preclinical drug development and testing, to clinical trials. These proceedings will review the major points of discussion. © 2015 Wiley Periodicals, Inc.
Religion insulates ingroup evaluations: the development of intergroup attitudes in India.
Dunham, Yarrow; Srinivasan, Mahesh; Dotsch, Ron; Barner, David
2014-03-01
Research on the development of implicit intergroup attitudes has placed heavy emphasis on race, leaving open how social categories that are prominent in other cultures might operate. We investigate two of India's primary means of social distinction, caste and religion, and explore the development of implicit and explicit attitudes towards these groups in minority-status Muslim children and majority-status Hindu children, the latter drawn from various positions in the Hindu caste system. Results from two tests of implicit attitudes find that caste attitudes parallel previous findings for race: higher-caste children as well as lower-caste children have robust high-caste preferences. However, results for religion were strikingly different: both lower-status Muslim children and higher-status Hindu children show strong implicit ingroup preferences. We suggest that religion may play a protective role in insulating children from the internalization of stigma. © 2013 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammons, T.J.
1995-02-01
The twenty-ninth Universities Power Engineering Conference (UPEC `94), held September 14--16, 1994, at University College, Galway, Ireland, surpassed all previous meetings in respect of number and quality of technical content of the papers and number of delegates attending. As in the past, it had a broad theme, covering all aspects of electrical power engineering, and was attended by academics, research workers, and members of the power service and manufacturing organizations. During the sessions, 265 papers from more than 30 countries were debated. There were 27 technical sessions, 3 poster sessions, and an opening and a closing session, 160 papers beingmore » presented orally in four groups of parallel sessions, the remainder being presented in poster sessions. The high standard of the papers, presentations, and technical discussions was particularly gratifying. The Universities Power Engineering Conference, held annually, provides a forum for the exchange of ideas among practicing engineers from the universities, consultants, and in the manufacturing and supply industries.« less