Sample records for analysing multi-threaded applications

  1. Multi-threading: A new dimension to massively parallel scientific computation

    NASA Astrophysics Data System (ADS)

    Nielsen, Ida M. B.; Janssen, Curtis L.

    2000-06-01

    Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.

  2. Implementation of a multi-threaded framework for large-scale scientific applications

    DOE PAGES

    Sexton-Kennedy, E.; Gartung, Patrick; Jones, C. D.; ...

    2015-05-22

    The CMS experiment has recently completed the development of a multi-threaded capable application framework. In this paper, we will discuss the design, implementation and application of this framework to production applications in CMS. For the 2015 LHC run, this functionality is particularly critical for both our online and offline production applications, which depend on faster turn-around times and a reduced memory footprint relative to before. These applications are complex codes, each including a large number of physics-driven algorithms. While the framework is capable of running a mix of thread-safe and 'legacy' modules, algorithms running in our production applications need tomore » be thread-safe for optimal use of this multi-threaded framework at a large scale. Towards this end, we discuss the types of changes, which were necessary for our algorithms to achieve good performance of our multithreaded applications in a full-scale application. Lastly performance numbers for what has been achieved for the 2015 run are presented.« less

  3. Self-cleaning threaded rod spinneret for high-efficiency needleless electrospinning

    NASA Astrophysics Data System (ADS)

    Zheng, Gaofeng; Jiang, Jiaxin; Wang, Xiang; Li, Wenwang; Zhong, Weizheng; Guo, Shumin

    2018-07-01

    High-efficiency production of nanofibers is the key to the application of electrospinning technology. This work focuses on multi-jet electrospinning, in which a threaded rod electrode is utilized as the needless spinneret to achieve high-efficiency production of nanofibers. A slipper block, which fits into and moves through the threaded rod, is designed to transfer polymer solution evenly to the surface of the rod spinneret. The relative motion between the slipper block and the threaded rod electrode promotes the instable fluctuation of the solution surface, thus the rotation of threaded rod electrode decreases the critical voltage for the initial multi-jet ejection and the diameter of nanofibers. The residual solution on the surface of threaded rod is cleaned up by the moving slipper block, showing a great self-cleaning ability, which ensures the stable multi-jet ejection and increases the productivity of nanofibers. Each thread of the threaded rod electrode serves as an independent spinneret, which enhances the electric field strength and constrains the position of the Taylor cone, resulting in high productivity of uniform nanofibers. The diameter of nanofibers decreases with the increase of threaded rod rotation speed, and the productivity increases with the solution flow rate. The rotation of electrode provides an excess force for the ejection of charged jets, which also contributes to the high-efficiency production of nanofibers. The maximum productivity of nanofibers from the threaded rod spinneret is 5-6 g/h, about 250-300 times as high as that from the single-needle spinneret. The self-cleaning threaded rod spinneret is an effective way to realize continuous multi-jet electrospinning, which promotes industrial applications of uniform nanofibrous membrane.

  4. GPU accelerated dynamic functional connectivity analysis for functional MRI data.

    PubMed

    Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

    2015-07-01

    Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  6. Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

    DOE PAGES

    Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...

    2016-01-26

    This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.

  7. Shadow-Bitcoin: Scalable Simulation via Direct Execution of Multi-Threaded Applications

    DTIC Science & Technology

    2015-08-10

    Shadow- Bitcoin : Scalable Simulation via Direct Execution of Multi-threaded Applications Andrew Miller University of Maryland amiller@cs.umd.edu Rob...Shadow plug-in that directly executes the Bitcoin reference client software. To demonstrate the usefulness of this tool, we present novel denial-of...service attacks against the Bit- coin software that exploit low-level implementation ar- tifacts in the Bitcoin reference client; our determinis- tic

  8. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  9. Multi-threaded integration of HTC-Vive and MeVisLab

    NASA Astrophysics Data System (ADS)

    Gunacker, Simon; Gall, Markus; Schmalstieg, Dieter; Egger, Jan

    2018-03-01

    This work presents how Virtual Reality (VR) can easily be integrated into medical applications via a plugin for a medical image processing framework called MeVisLab. A multi-threaded plugin has been developed using OpenVR, a VR library that can be used for developing vendor and platform independent VR applications. The plugin is tested using the HTC Vive, a head-mounted display developed by HTC and Valve Corporation.

  10. Platform-Independence and Scheduling In a Multi-Threaded Real-Time Simulation

    NASA Technical Reports Server (NTRS)

    Sugden, Paul P.; Rau, Melissa A.; Kenney, P. Sean

    2001-01-01

    Aviation research often relies on real-time, pilot-in-the-loop flight simulation as a means to develop new flight software, flight hardware, or pilot procedures. Often these simulations become so complex that a single processor is incapable of performing the necessary computations within a fixed time-step. Threads are an elegant means to distribute the computational work-load when running on a symmetric multi-processor machine. However, programming with threads often requires operating system specific calls that reduce code portability and maintainability. While a multi-threaded simulation allows a significant increase in the simulation complexity, it also increases the workload of a simulation operator by requiring that the operator determine which models run on which thread. To address these concerns an object-oriented design was implemented in the NASA Langley Standard Real-Time Simulation in C++ (LaSRS++) application framework. The design provides a portable and maintainable means to use threads and also provides a mechanism to automatically load balance the simulation models.

  11. Topical perspective on massive threading and parallelism.

    PubMed

    Farber, Robert M

    2011-09-01

    Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.

  12. Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

    NASA Technical Reports Server (NTRS)

    Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Saini, Subhash (Technical Monitor)

    1998-01-01

    In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the adaption phase of FE applications oil triangular meshes and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments oil EARTH-SP2, on implementation of EARTH on the IBM SP2 with different load balancing strategies that are built into the runtime system.

  13. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  14. IOPA: I/O-aware parallelism adaption for parallel programs.

    PubMed

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.

  15. Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak

    1999-01-01

    The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

  16. Applying Jlint to Space Exploration Software

    NASA Technical Reports Server (NTRS)

    Artho, Cyrille; Havelund, Klaus

    2004-01-01

    Java is a very successful programming language which is also becoming widespread in embedded systems, where software correctness is critical. Jlint is a simple but highly efficient static analyzer that checks a Java program for several common errors, such as null pointer exceptions, and overflow errors. It also includes checks for multi-threading problems, such as deadlocks and data races. The case study described here shows the effectiveness of Jlint in find-false positives in the multi-threading warnings gives an insight into design patterns commonly used in multi-threaded code. The results show that a few analysis techniques are sufficient to avoid almost all false positives. These techniques include investigating all possible callers and a few code idioms. Verifying the correct application of these patterns is still crucial, because their correct usage is not trivial.

  17. CNT coated thread micro-electro-mechanical system for finger proprioception sensing

    NASA Astrophysics Data System (ADS)

    Shafi, A. A.; Wicaksono, D. H. B.

    2017-04-01

    In this paper, we aim to fabricate cotton thread based sensor for proprioceptive application. Cotton threads are utilized as the structural component of flexible sensors. The thread is coated with multi-walled carbon nanotube (MWCNT) dispersion by using facile conventional dipping-drying method. The electrical characterization of the coated thread found that the resistance per meter of the coated thread decreased with increasing the number of dipping. The CNT coated thread sensor works based on piezoresistive theory in which the resistance of the coated thread changes when force is applied. This thread sensor is sewed on glove at the index finger between middle and proximal phalanx parts and the resistance change is measured upon grasping mechanism. The thread based microelectromechanical system (MEMS) enables the flexible sensor to easily fit perfectly on the finger joint and gives reliable response as proprioceptive sensing.

  18. Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Su, Chun-Yi

    By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitivemore » or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems.« less

  19. Using all of your CPU's in HIPE

    NASA Astrophysics Data System (ADS)

    Jacobson, J. D.; Fadda, D.

    2012-09-01

    Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.

  20. Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

    DTIC Science & Technology

    2013-09-30

    STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the

  1. Integrating Health Information Systems into a Database Course: A Case Study

    ERIC Educational Resources Information Center

    Anderson, Nicole; Zhang, Mingrui; McMaster, Kirby

    2011-01-01

    Computer Science is a rich field with many growing application areas, such as Health Information Systems. What we suggest here is that multi-disciplinary threads can be introduced to supplement, enhance, and strengthen the primary area of study in a course. We call these supplementary materials "threads," because they are executed…

  2. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    NASA Astrophysics Data System (ADS)

    Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

    2017-10-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.

  3. CMS event processing multi-core efficiency status

    NASA Astrophysics Data System (ADS)

    Jones, C. D.; CMS Collaboration

    2017-10-01

    In 2015, CMS was the first LHC experiment to begin using a multi-threaded framework for doing event processing. This new framework utilizes Intel’s Thread Building Block library to manage concurrency via a task based processing model. During the 2015 LHC run period, CMS only ran reconstruction jobs using multiple threads because only those jobs were sufficiently thread efficient. Recent work now allows simulation and digitization to be thread efficient. In addition, during 2015 the multi-threaded framework could run events in parallel but could only use one thread per event. Work done in 2016 now allows multiple threads to be used while processing one event. In this presentation we will show how these recent changes have improved CMS’s overall threading and memory efficiency and we will discuss work to be done to further increase those efficiencies.

  4. Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

    PubMed

    Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

    2016-04-01

    Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  5. HPC Profiling with the Sun Studio™ Performance Tools

    NASA Astrophysics Data System (ADS)

    Itzkowitz, Marty; Maruyama, Yukon

    In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.

  6. Multi-Core Processors: An Enabling Technology for Embedded Distributed Model-Based Control (Postprint)

    DTIC Science & Technology

    2008-07-01

    generation of process partitioning, a thread pipelining becomes possible. In this paper we briefly summarize the requirements and trends for FADEC based... FADEC environment, presenting a hypothetical realization of an example application. Finally we discuss the application of Time-Triggered...based control applications of the future. 15. SUBJECT TERMS Gas turbine, FADEC , Multi-core processing technology, disturbed based control

  7. Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

    NASA Astrophysics Data System (ADS)

    Backes, Werner; Wetzel, Susanne

    In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.

  8. Optimized FPGA Implementation of Multi-Rate FIR Filters Through Thread Decomposition

    NASA Technical Reports Server (NTRS)

    Zheng, Jason Xin; Nguyen, Kayla; He, Yutao

    2010-01-01

    Multirate (decimation/interpolation) filters are among the essential signal processing components in spaceborne instruments where Finite Impulse Response (FIR) filters are often used to minimize nonlinear group delay and finite-precision effects. Cascaded (multi-stage) designs of Multi-Rate FIR (MRFIR) filters are further used for large rate change ratio, in order to lower the required throughput while simultaneously achieving comparable or better performance than single-stage designs. Traditional representation and implementation of MRFIR employ polyphase decomposition of the original filter structure, whose main purpose is to compute only the needed output at the lowest possible sampling rate. In this paper, an alternative representation and implementation technique, called TD-MRFIR (Thread Decomposition MRFIR), is presented. The basic idea is to decompose MRFIR into output computational threads, in contrast to a structural decomposition of the original filter as done in the polyphase decomposition. Each thread represents an instance of the finite convolution required to produce a single output of the MRFIR. The filter is thus viewed as a finite collection of concurrent threads. The technical details of TD-MRFIR will be explained, first showing its applicability to the implementation of downsampling, upsampling, and resampling FIR filters, and then describing a general strategy to optimally allocate the number of filter taps. A particular FPGA design of multi-stage TD-MRFIR for the L-band radar of NASA's SMAP (Soil Moisture Active Passive) instrument is demonstrated; and its implementation results in several targeted FPGA devices are summarized in terms of the functional (bit width, fixed-point error) and performance (time closure, resource usage, and power estimation) parameters.

  9. A knittable fiber-shaped supercapacitor based on natural cotton thread for wearable electronics

    NASA Astrophysics Data System (ADS)

    Zhou, Qianlong; Jia, Chunyang; Ye, Xingke; Tang, Zhonghua; Wan, Zhongquan

    2016-09-01

    At present, the topic of building high-performance, miniaturized and mechanically flexible energy storage modules which can be directly integrated into textile based wearable electronics is a hotspot in the wearable technology field. In this paper, we reported a highly flexible fiber-shaped electrode fabricated through a one-step convenient hydrothermal process. The prepared graphene hydrogels/multi-walled carbon nanotubes-cotton thread derived from natural cotton thread is electrochemically active and mechanically strong. Fiber-shaped supercapacitor based on the prepared fiber electrodes and polyvinyl alcohol-H3PO4 gel electrolyte exhibits good capacitive performance (97.73 μF cm-1 at scan rate of 2 mV s-1), long cycle life (95.51% capacitance retention after 8000 charge-discharge cycles) and considerable stability (90.75% capacitance retention after 500 continuous bending cycles). Due to its good mechanical and electrochemical properties, the graphene hydrogels/multi-walled carbon nanotubes-cotton thread based all-solid fiber-shaped supercapacitor can be directly knitted into fabrics and maintain its original capacitive performance. Such a low-cost textile thread based versatile energy storage device may hold great potential for future wearable electronics applications.

  10. Using Multithreading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

    NASA Technical Reports Server (NTRS)

    Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Bailey, David H. (Technical Monitor)

    1998-01-01

    In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes. The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the question phase of FE applications on triangular meshes, and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments on EARTH-SP2, an implementation of EARTH on the IBM SP2, with different load balancing strategies that are built into the runtime system.

  11. Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

    NASA Technical Reports Server (NTRS)

    Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

    2004-01-01

    In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.

  12. Multi-Threaded DNA Tag/Anti-Tag Library Generator for Multi-Core Platforms

    DTIC Science & Technology

    2009-05-01

    base pair)  Watson ‐ Crick  strand pairs that bind perfectly within pairs, but poorly across pairs. A variety  of  DNA  strand hybridization metrics...AFRL-RI-RS-TR-2009-131 Final Technical Report May 2009 MULTI-THREADED DNA TAG/ANTI-TAG LIBRARY GENERATOR FOR MULTI-CORE PLATFORMS...TYPE Final 3. DATES COVERED (From - To) Jun 08 – Feb 09 4. TITLE AND SUBTITLE MULTI-THREADED DNA TAG/ANTI-TAG LIBRARY GENERATOR FOR MULTI-CORE

  13. Experimental investigation of effects of stitching orientation on forming behaviors of 2D P-aramid multilayer woven preform

    NASA Astrophysics Data System (ADS)

    Abtew, Mulat Alubel; Boussu, François; Bruniaux, Pascal; Loghin, Carmen; Cristian, Irina; Chen, Yan; Wang, Lichuan

    2018-05-01

    In many textile applications stitching process is one of the widely used methods to join the multi-layer fabric plies not only due to its easy applicability and flexible production but also provide structural integrity throughout-the-thickness of materials. In this research, the influences of stitching pattern on various molding characteristics of multi-layer 2D para-aramid plain woven fabrics while deformation was investigated. The fabrics were made of high performance fiber with 930dtex yarn linear density and fabric areal density of 200gm/m2. First, different stitch pattern (orientation) was applied for joining the mentioned multi-layered fabrics keeping other stitching parameters such as stitch gap, stitch thread tension, stitch length, stitch type, stitch thread type etc. constant throughout the study. Then, a pneumatic based molding device with a low speed forming process specially designed for preforming of textile with a predefined hemispherical shape of punch. The result shows that stitching pattern is one of the parameter that influences the different molding behavior and should be consider while molding stitched multi-layer fabrics.

  14. A multi-threaded version of MCFM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, John M.; Ellis, R. Keith; Giele, Walter T.

    We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.

  15. Geant4 Computing Performance Benchmarking and Monitoring

    DOE PAGES

    Dotti, Andrea; Elvira, V. Daniel; Folger, Gunter; ...

    2015-12-23

    Performance evaluation and analysis of large scale computing applications is essential for optimal use of resources. As detector simulation is one of the most compute intensive tasks and Geant4 is the simulation toolkit most widely used in contemporary high energy physics (HEP) experiments, it is important to monitor Geant4 through its development cycle for changes in computing performance and to identify problems and opportunities for code improvements. All Geant4 development and public releases are being profiled with a set of applications that utilize different input event samples, physics parameters, and detector configurations. Results from multiple benchmarking runs are compared tomore » previous public and development reference releases to monitor CPU and memory usage. Observed changes are evaluated and correlated with code modifications. Besides the full summary of call stack and memory footprint, a detailed call graph analysis is available to Geant4 developers for further analysis. The set of software tools used in the performance evaluation procedure, both in sequential and multi-threaded modes, include FAST, IgProf and Open|Speedshop. In conclusion, the scalability of the CPU time and memory performance in multi-threaded application is evaluated by measuring event throughput and memory gain as a function of the number of threads for selected event samples.« less

  16. Merced

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hedstrom, Gerald; Beck, Bret; Mattoon, Caleb

    2016-10-01

    Merced performs a multi-dimensional integral tl generate so-called 'transfer matrices' for use in deterministic radiation transport applications. It produces transfer matrices on the user-defind energy grid. The angular dependence of outgoing products is captured in a Legendre expansion, up to a user-specified maximun Legendre order. Merced calculations can use multi-threading for enhanced performance on a single compute node.

  17. SISSY: An example of a multi-threaded, networked, object-oriented databased application

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scipioni, B.; Liu, D.; Song, T.

    1993-05-01

    The Systems Integration Support SYstem (SISSY) is presented and its capabilities and techniques are discussed. It is fully automated data collection and analysis system supporting the SSCL`s systems analysis activities as they relate to the Physics Detector and Simulation Facility (PDSF). SISSY itself is a paradigm of effective computing on the PDSF. It uses home-grown code (C++), network programming (RPC, SNMP), relational (SYBASE) and object-oriented (ObjectStore) DBMSs, UNIX operating system services (IRIX threads, cron, system utilities, shells scripts, etc.), and third party software applications (NetCentral Station, Wingz, DataLink) all of which act together as a single application to monitor andmore » analyze the PDSF.« less

  18. Multi-threading performance of Geant4, MCNP6, and PHITS Monte Carlo codes for tetrahedral-mesh geometry.

    PubMed

    Han, Min Cheol; Yeom, Yeon Soo; Lee, Hyun Su; Shin, Bangho; Kim, Chan Hyeong; Furuta, Takuya

    2018-05-04

    In this study, the multi-threading performance of the Geant4, MCNP6, and PHITS codes was evaluated as a function of the number of threads (N) and the complexity of the tetrahedral-mesh phantom. For this, three tetrahedral-mesh phantoms of varying complexity (simple, moderately complex, and highly complex) were prepared and implemented in the three different Monte Carlo codes, in photon and neutron transport simulations. Subsequently, for each case, the initialization time, calculation time, and memory usage were measured as a function of the number of threads used in the simulation. It was found that for all codes, the initialization time significantly increased with the complexity of the phantom, but not with the number of threads. Geant4 exhibited much longer initialization time than the other codes, especially for the complex phantom (MRCP). The improvement of computation speed due to the use of a multi-threaded code was calculated as the speed-up factor, the ratio of the computation speed on a multi-threaded code to the computation speed on a single-threaded code. Geant4 showed the best multi-threading performance among the codes considered in this study, with the speed-up factor almost linearly increasing with the number of threads, reaching ~30 when N  =  40. PHITS and MCNP6 showed a much smaller increase of the speed-up factor with the number of threads. For PHITS, the speed-up factors were low when N  =  40. For MCNP6, the increase of the speed-up factors was better, but they were still less than ~10 when N  =  40. As for memory usage, Geant4 was found to use more memory than the other codes. In addition, compared to that of the other codes, the memory usage of Geant4 more rapidly increased with the number of threads, reaching as high as ~74 GB when N  =  40 for the complex phantom (MRCP). It is notable that compared to that of the other codes, the memory usage of PHITS was much lower, regardless of both the complexity of the phantom and the number of threads, hardly increasing with the number of threads for the MRCP.

  19. Thread scheduling for GPU-based OPC simulation on multi-thread

    NASA Astrophysics Data System (ADS)

    Lee, Heejun; Kim, Sangwook; Hong, Jisuk; Lee, Sooryong; Han, Hwansoo

    2018-03-01

    As semiconductor product development based on shrinkage continues, the accuracy and difficulty required for the model based optical proximity correction (MBOPC) is increasing. OPC simulation time, which is the most timeconsuming part of MBOPC, is rapidly increasing due to high pattern density in a layout and complex OPC model. To reduce OPC simulation time, we attempt to apply graphic processing unit (GPU) to MBOPC because OPC process is good to be programmed in parallel. We address some issues that may typically happen during GPU-based OPC simulation in multi thread system, such as "out of memory" and "GPU idle time". To overcome these problems, we propose a thread scheduling method, which manages OPC jobs in multiple threads in such a way that simulations jobs from multiple threads are alternatively executed on GPU while correction jobs are executed at the same time in each CPU cores. It was observed that the amount of GPU peak memory usage decreases by up to 35%, and MBOPC runtime also decreases by 4%. In cases where out of memory issues occur in a multi-threaded environment, the thread scheduler was used to improve MBOPC runtime up to 23%.

  20. Using a source-to-source transformation to introduce multi-threading into the AliRoot framework for a parallel event reconstruction

    NASA Astrophysics Data System (ADS)

    Lohn, Stefan B.; Dong, Xin; Carminati, Federico

    2012-12-01

    Chip-Multiprocessors are going to support massive parallelism by many additional physical and logical cores. Improving performance can no longer be obtained by increasing clock-frequency because the technical limits are almost reached. Instead, parallel execution must be used to gain performance. Resources like main memory, the cache hierarchy, bandwidth of the memory bus or links between cores and sockets are not going to be improved as fast. Hence, parallelism can only result into performance gains if the memory usage is optimized and the communication between threads is minimized. Besides concurrent programming has become a domain for experts. Implementing multi-threading is error prone and labor-intensive. A full reimplementation of the whole AliRoot source-code is unaffordable. This paper describes the effort to evaluate the adaption of AliRoot to the needs of multi-threading and to provide the capability of parallel processing by using a semi-automatic source-to-source transformation to address the problems as described before and to provide a straight-forward way of parallelization with almost no interference between threads. This makes the approach simple and reduces the required manual changes in the code. In a first step, unconditional thread-safety will be introduced to bring the original sequential and thread unaware source-code into the position of utilizing multi-threading. Afterwards further investigations have to be performed to point out candidates of classes that are useful to share amongst threads. Then in a second step, the transformation has to change the code to share these classes and finally to verify if there are anymore invalid interferences between threads.

  1. A Review of Lightweight Thread Approaches for High Performance Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castello, Adrian; Pena, Antonio J.; Seo, Sangmin

    High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less

  2. Multi-threaded Event Processing with DANA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    David Lawrence; Elliott Wolin

    2007-05-14

    The C++ data analysis framework DANA has been written to support the next generation of Nuclear Physics experiments at Jefferson Lab commensurate with the anticipated 12GeV upgrade. The DANA framework was designed to allow multi-threaded event processing with a minimal impact on developers of reconstruction software. This document describes how DANA implements multi-threaded event processing and compares it to simply running multiple instances of a program. Also presented are relative reconstruction rates for Pentium4, Xeon, and Opteron based machines.

  3. AthenaMT: upgrading the ATLAS software framework for the many-core world with multi-threading

    NASA Astrophysics Data System (ADS)

    Leggett, Charles; Baines, John; Bold, Tomasz; Calafiura, Paolo; Farrell, Steven; van Gemmeren, Peter; Malon, David; Ritsch, Elmar; Stewart, Graeme; Snyder, Scott; Tsulaia, Vakhtang; Wynne, Benjamin; ATLAS Collaboration

    2017-10-01

    ATLAS’s current software framework, Gaudi/Athena, has been very successful for the experiment in LHC Runs 1 and 2. However, its single threaded design has been recognized for some time to be increasingly problematic as CPUs have increased core counts and decreased available memory per core. Even the multi-process version of Athena, AthenaMP, will not scale to the range of architectures we expect to use beyond Run2. After concluding a rigorous requirements phase, where many design components were examined in detail, ATLAS has begun the migration to a new data-flow driven, multi-threaded framework, which enables the simultaneous processing of singleton, thread unsafe legacy Algorithms, cloned Algorithms that execute concurrently in their own threads with different Event contexts, and fully re-entrant, thread safe Algorithms. In this paper we report on the process of modifying the framework to safely process multiple concurrent events in different threads, which entails significant changes in the underlying handling of features such as event and time dependent data, asynchronous callbacks, metadata, integration with the online High Level Trigger for partial processing in certain regions of interest, concurrent I/O, as well as ensuring thread safety of core services. We also report on upgrading the framework to handle Algorithms that are fully re-entrant.

  4. Servicing a globally broadcast interrupt signal in a multi-threaded computer

    DOEpatents

    Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.; Satterfield, David L.

    2015-12-29

    Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whether global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.

  5. Software Defined Radio with Parallelized Software Architecture

    NASA Technical Reports Server (NTRS)

    Heckler, Greg

    2013-01-01

    This software implements software-defined radio procession over multi-core, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to .50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.

  6. OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads

    NASA Astrophysics Data System (ADS)

    Kosakowski, G.; Kulik, D. A.; Shao, H.

    2012-04-01

    OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite element nodes with calls to GEMS3K and consecutive calculations of changed material parameters. In a first step the existing MPI implementation was utilized to parallelize this loop. Calculations were split between the MPI processes and afterwards data was synchronized by using MPI communication routines. Furthermore, multi-threaded calculation of the loop was implemented with help of the boost thread library (http://www.boost.org). This implementation provides a flexible environment to distribute calculations between several threads. For each MPI process at least one and up to several dozens of worker threads are spawned. These threads do not replicate the complete OGS-GEM data structure and use only a limited amount of memory. Calculation of the central geochemical loop is shared between all threads. Synchronization between the threads is done by barrier commands. The overall number of local threads times MPI processes should match the number of available computing nodes. The combination of multi-threading and MPI provides an effective and flexible environment to speed up OGS-GEMS calculations while limiting the required memory use. Test calculations on different hardware show that for certain types of applications tremendous speedups are possible.

  7. Designing Next Generation Massively Multithreaded Architectures for Irregular Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tumeo, Antonino; Secchi, Simone; Villa, Oreste

    Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multi-threaded architectures with large node count, like the Cray XMT, have been shown to address their requirements better than commodity clusters. In this paper we present the approaches that we are currently pursuing to design future generations of these architectures. First, we introduce the Cray XMT and compare it to other multithreaded architectures. We then propose an evolution of the architecture, integrating multiple cores per node and next generation network interconnect. We advocate the use of hardware support for remote memory referencemore » aggregation to optimize network utilization. For this evaluation we developed a highly parallel, custom simulation infrastructure for multi-threaded systems. Our simulator executes unmodified XMT binaries with very large datasets, capturing effects due to contention and hot-spotting, while predicting execution times with greater than 90% accuracy. We also discuss the FPGA prototyping approach that we are employing to study efficient support for irregular applications in next generation manycore processors.« less

  8. Playback system designed for X-Band SAR

    NASA Astrophysics Data System (ADS)

    Yuquan, Liu; Changyong, Dou

    2014-03-01

    SAR(Synthetic Aperture Radar) has extensive application because it is daylight and weather independent. In particular, X-Band SAR strip map, designed by Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, provides high ground resolution images, at the same time it has a large spatial coverage and a short acquisition time, so it is promising in multi-applications. When sudden disaster comes, the emergency situation acquires radar signal data and image as soon as possible, in order to take action to reduce loss and save lives in the first time. This paper summarizes a type of X-Band SAR playback processing system designed for disaster response and scientific needs. It describes SAR data workflow includes the payload data transmission and reception process. Playback processing system completes signal analysis on the original data, providing SAR level 0 products and quick image. Gigabit network promises radar signal transmission efficiency from recorder to calculation unit. Multi-thread parallel computing and ping pong operation can ensure computation speed. Through gigabit network, multi-thread parallel computing and ping pong operation, high speed data transmission and processing meet the SAR radar data playback real time requirement.

  9. Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

    PubMed

    Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

    2018-05-03

    Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.

  10. Multi-threaded parallel simulation of non-local non-linear problems in ultrashort laser pulse propagation in the presence of plasma

    NASA Astrophysics Data System (ADS)

    Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger

    2011-06-01

    We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.

  11. Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ceriani, Marco; Palermo, Gianluca; Secchi, Simone

    We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less

  12. A Tool for Intersecting Context-Free Grammars and Its Applications

    NASA Technical Reports Server (NTRS)

    Gange, Graeme; Navas, Jorge A.; Schachte, Peter; Sondergaard, Harald; Stuckey, Peter J.

    2015-01-01

    This paper describes a tool for intersecting context-free grammars. Since this problem is undecidable the tool follows a refinement-based approach and implements a novel refinement which is complete for regularly separable grammars. We show its effectiveness for safety verification of recursive multi-threaded programs.

  13. Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures

    NASA Astrophysics Data System (ADS)

    Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi

    2017-04-01

    Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.

  14. Reducing False Positives in Runtime Analysis of Deadlocks

    NASA Technical Reports Server (NTRS)

    Bensalem, Saddek; Havelund, Klaus; Clancy, Daniel (Technical Monitor)

    2002-01-01

    This paper presents an improvement of a standard algorithm for detecting dead-lock potentials in multi-threaded programs, in that it reduces the number of false positives. The standard algorithm works as follows. The multi-threaded program under observation is executed, while lock and unlock events are observed. A graph of locks is built, with edges between locks symbolizing locking orders. Any cycle in the graph signifies a potential for a deadlock. The typical standard example is the group of dining philosophers sharing forks. The algorithm is interesting because it can catch deadlock potentials even though no deadlocks occur in the examined trace, and at the same time it scales very well in contrast t o more formal approaches to deadlock detection. The algorithm, however, can yield false positives (as well as false negatives). The extension of the algorithm described in this paper reduces the amount of false positives for three particular cases: when a gate lock protects a cycle, when a single thread introduces a cycle, and when the code segments in different threads that cause the cycle can actually not execute in parallel. The paper formalizes a theory for dynamic deadlock detection and compares it to model checking and static analysis techniques. It furthermore describes an implementation for analyzing Java programs and its application to two case studies: a planetary rover and a space craft altitude control system.

  15. Software Defined Radio with Parallelized Software Architecture

    NASA Technical Reports Server (NTRS)

    Heckler, Greg

    2013-01-01

    This software implements software-defined radio procession over multicore, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to approx.50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.

  16. High-Level Data Races

    NASA Technical Reports Server (NTRS)

    Artho, Cyrille; Havelund, Klaus; Biere, Armin; Koga, Dennis (Technical Monitor)

    2003-01-01

    Data races are a common problem in concurrent and multi-threaded programming. They are hard to detect without proper tool support. Despite the successful application of these tools, experience shows that the notion of data race is not powerful enough to capture certain types of inconsistencies occurring in practice. In this paper we investigate data races on a higher abstraction layer. This enables us to detect inconsistent uses of shared variables, even if no classical race condition occurs. For example, a data structure representing a coordinate pair may have to be treated atomically. By lifting the meaning of a data race to a higher level, such problems can now be covered. The paper defines the concepts view and view consistency to give a notation for this novel kind of property. It describes what kinds of errors can be detected with this new definition, and where its limitations are. It also gives a formal guideline for using data structures in a multi-threading environment.

  17. Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

    NASA Astrophysics Data System (ADS)

    Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

    2017-10-01

    In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.

  18. Multithreaded Stochastic PDES for Reactions and Diffusions in Neurons.

    PubMed

    Lin, Zhongwei; Tropper, Carl; Mcdougal, Robert A; Patoary, Mohammand Nazrul Ishlam; Lytton, William W; Yao, Yiping; Hines, Michael L

    2017-07-01

    Cells exhibit stochastic behavior when the number of molecules is small. Hence a stochastic reaction-diffusion simulator capable of working at scale can provide a more accurate view of molecular dynamics within the cell. This paper describes a parallel discrete event simulator, Neuron Time Warp-Multi Thread (NTW-MT), developed for the simulation of reaction diffusion models of neurons. To the best of our knowledge, this is the first parallel discrete event simulator oriented towards stochastic simulation of chemical reactions in a neuron. The simulator was developed as part of the NEURON project. NTW-MT is optimistic and thread-based, which attempts to capitalize on multi-core architectures used in high performance machines. It makes use of a multi-level queue for the pending event set and a single roll-back message in place of individual anti-messages to disperse contention and decrease the overhead of processing rollbacks. Global Virtual Time is computed asynchronously both within and among processes to get rid of the overhead for synchronizing threads. Memory usage is managed in order to avoid locking and unlocking when allocating and de-allocating memory and to maximize cache locality. We verified our simulator on a calcium buffer model. We examined its performance on a calcium wave model, comparing it to the performance of a process based optimistic simulator and a threaded simulator which uses a single priority queue for each thread. Our multi-threaded simulator is shown to achieve superior performance to these simulators. Finally, we demonstrated the scalability of our simulator on a larger CICR model and a more detailed CICR model.

  19. Symbolic Analysis of Concurrent Programs with Polymorphism

    NASA Technical Reports Server (NTRS)

    Rungta, Neha Shyam

    2010-01-01

    The current trend of multi-core and multi-processor computing is causing a paradigm shift from inherently sequential to highly concurrent and parallel applications. Certain thread interleavings, data input values, or combinations of both often cause errors in the system. Systematic verification techniques such as explicit state model checking and symbolic execution are extensively used to detect errors in such systems [7, 9]. Explicit state model checking enumerates possible thread schedules and input data values of a program in order to check for errors [3, 9]. To partially mitigate the state space explosion from data input values, symbolic execution techniques substitute data input values with symbolic values [5, 7, 6]. Explicit state model checking and symbolic execution techniques used in conjunction with exhaustive search techniques such as depth-first search are unable to detect errors in medium to large-sized concurrent programs because the number of behaviors caused by data and thread non-determinism is extremely large. We present an overview of abstraction-guided symbolic execution for concurrent programs that detects errors manifested by a combination of thread schedules and data values [8]. The technique generates a set of key program locations relevant in testing the reachability of the target locations. The symbolic execution is then guided along these locations in an attempt to generate a feasible execution path to the error state. This allows the execution to focus in parts of the behavior space more likely to contain an error.

  20. Real-time SHVC software decoding with multi-threaded parallel processing

    NASA Astrophysics Data System (ADS)

    Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

    2014-09-01

    This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

  1. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  2. Transformation Systems at NASA Ames

    NASA Technical Reports Server (NTRS)

    Buntine, Wray; Fischer, Bernd; Havelund, Klaus; Lowry, Michael; Pressburger, TOm; Roach, Steve; Robinson, Peter; VanBaalen, Jeffrey

    1999-01-01

    In this paper, we describe the experiences of the Automated Software Engineering Group at the NASA Ames Research Center in the development and application of three different transformation systems. The systems span the entire technology range, from deductive synthesis, to logic-based transformation, to almost compiler-like source-to-source transformation. These systems also span a range of NASA applications, including solving solar system geometry problems, generating data analysis software, and analyzing multi-threaded Java code.

  3. Expressing Parallelism with ROOT

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  4. Expressing Parallelism with ROOT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piparo, D.; Tejedor, E.; Guiraud, E.

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less

  5. Large Scale Document Inversion using a Multi-threaded Computing System

    PubMed Central

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2018-01-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701

  6. Large Scale Document Inversion using a Multi-threaded Computing System.

    PubMed

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2017-06-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  7. Results of SEI Independent Research and Development Projects

    DTIC Science & Technology

    2008-12-01

    contained there. When laptops with a dual-core processor came out, ITunes fails crashed. ITunes was designed as multi-threaded application, but until...involving product portfolio, in-bound technical marketing, research and development, product engineering, supply chain, and out-bound sales and marketing...of quality and process improvement professionals to the marketing, product engineering, supply chain, product test and sales professionals. 3

  8. Asynchronous Message Service Reference Implementation

    NASA Technical Reports Server (NTRS)

    Burleigh, Scott C.

    2011-01-01

    This software provides a library of middleware functions with a simple application programming interface, enabling implementation of distributed applications in conformance with the CCSDS AMS (Consultative Committee for Space Data Systems Asynchronous Message Service) specification. The AMS service, and its protocols, implement an architectural concept under which the modules of mission systems may be designed as if they were to operate in isolation, each one producing and consuming mission information without explicit awareness of which other modules are currently operating. Communication relationships among such modules are self-configuring; this tends to minimize complexity in the development and operations of modular data systems. A system built on this model is a society of generally autonomous, inter-operating modules that may fluctuate freely over time in response to changing mission objectives, modules functional upgrades, and recovery from individual module failure. The purpose of AMS, then, is to reduce mission cost and risk by providing standard, reusable infrastructure for the exchange of information among data system modules in a manner that is simple to use, highly automated, flexible, robust, scalable, and efficient. The implementation is designed to spawn multiple threads of AMS functionality under the control of an AMS application program. These threads enable all members of an AMS-based, distributed application to discover one another in real time, subscribe to messages on specific topics, and to publish messages on specific topics. The query/reply (client/server) communication model is also supported. Message exchange is optionally subject to encryption (to support confidentiality) and authorization. Fault tolerance measures in the discovery protocol minimize the likelihood of overall application failure due to any single operational error anywhere in the system. The multi-threaded design simplifies processing while enabling application nodes to operate at high speeds; linked lists protected by mutex semaphores and condition variables are used for efficient, inter-thread communication. Applications may use a variety of transport protocols underlying AMS itself, including TCP (Transmission Control Protocol), UDP (User Datagram Protocol), and message queues.

  9. Validation of the Transient Structural Response of a Threaded Assembly: Phase I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Doebling, Scott W.; Hemez, Francois M.; Robertson, Amy N.

    2004-04-01

    This report explores the application of model validation techniques in structural dynamics. The problem of interest is the propagation of an explosive-driven mechanical shock through a complex threaded joint. The study serves the purpose of assessing whether validating a large-size computational model is feasible, which unit experiments are required, and where the main sources of uncertainty reside. The results documented here are preliminary, and the analyses are exploratory in nature. The results obtained to date reveal several deficiencies of the analysis, to be rectified in future work.

  10. Vectorization, threading, and cache-blocking considerations for hydrocodes on emerging architectures

    DOE PAGES

    Fung, J.; Aulwes, R. T.; Bement, M. T.; ...

    2015-07-14

    This work reports on considerations for improving computational performance in preparation for current and expected changes to computer architecture. The algorithms studied will include increasingly complex prototypes for radiation hydrodynamics codes, such as gradient routines and diffusion matrix assembly (e.g., in [1-6]). The meshes considered for the algorithms are structured or unstructured meshes. The considerations applied for performance improvements are meant to be general in terms of architecture (not specifically graphical processing unit (GPUs) or multi-core machines, for example) and include techniques for vectorization, threading, tiling, and cache blocking. Out of a survey of optimization techniques on applications such asmore » diffusion and hydrodynamics, we make general recommendations with a view toward making these techniques conceptually accessible to the applications code developer. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.« less

  11. Multicore Challenges and Benefits for High Performance Scientific Computing

    DOE PAGES

    Nielsen, Ida M. B.; Janssen, Curtis L.

    2008-01-01

    Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less

  12. Thread selection according to power characteristics during context switching on compute nodes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J.; Blocksome, Michael A.; Randles, Amanda E.

    Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less

  13. Thread selection according to predefined power characteristics during context switching on compute nodes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None, None

    Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less

  14. Electronic Structure Calculations and Adaptation Scheme in Multi-core Computing Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seshagiri, Lakshminarasimhan; Sosonkina, Masha; Zhang, Zhao

    2009-05-20

    Multi-core processing environments have become the norm in the generic computing environment and are being considered for adding an extra dimension to the execution of any application. The T2 Niagara processor is a very unique environment where it consists of eight cores having a capability of running eight threads simultaneously in each of the cores. Applications like General Atomic and Molecular Electronic Structure (GAMESS), used for ab-initio molecular quantum chemistry calculations, can be good indicators of the performance of such machines and would be a guideline for both hardware designers and application programmers. In this paper we try to benchmarkmore » the GAMESS performance on a T2 Niagara processor for a couple of molecules. We also show the suitability of using a middleware based adaptation algorithm on GAMESS on such a multi-core environment.« less

  15. WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

    NASA Astrophysics Data System (ADS)

    Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.

    2017-02-01

    We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.

  16. Final report for the Tera Computer TTI CRADA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davidson, G.S.; Pavlakos, C.; Silva, C.

    1997-01-01

    Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTAmore » parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.« less

  17. Comprehensive Software Simulation on Ground Power Supply for Launch Pads and Processing Facilities at NASA Kennedy Space Center

    NASA Technical Reports Server (NTRS)

    Dominguez, Jesus A.; Victor, Elias; Vasquez, Angel L.; Urbina, Alfredo R.

    2017-01-01

    A multi-threaded software application has been developed in-house by the Ground Special Power (GSP) team at NASA Kennedy Space Center (KSC) to separately simulate and fully emulate all units that supply VDC power and battery-based power backup to multiple KSC launch ground support systems for NASA Space Launch Systems (SLS) rocket.

  18. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  19. Design of internal screw thread measuring device based on the Three-Line method principle

    NASA Astrophysics Data System (ADS)

    Hu, Dachao; Chen, Jianguo

    2010-08-01

    In accordance with the principle of Three-Line, this paper analyze the correlation of every main parameter of internal screw thread, and then designed a device to measure the main parameters of internal screw thread. Internal thread parameters, such as the pitch diameter, thread angle and screw-pitch of common screw thread, terraced screw thread, zigzag screw thread were obtained through calculation and measurement. The practical applications have proved that this device is convenience to use, and the measurements have a high accuracy. Meanwhile, the application for the patent of invention has been accepted by the Patent Office (Filing number: 200710044081.5).

  20. Development of an Autonomous Navigation Technology Test Vehicle

    DTIC Science & Technology

    2004-08-01

    as an independent thread on processors using the Linux operating system. The computer hardware selected for the nodes that host the MRS threads...communications system design. Linux was chosen as the operating system for all of the single board computers used on the Mule. Linux was specifically...used for system analysis and development. The simple realization of multi-thread processing and inter-process communications in Linux made it a

  1. AN MHD AVALANCHE IN A MULTI-THREADED CORONAL LOOP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hood, A. W.; Cargill, P. J.; Tam, K. V.

    For the first time, we demonstrate how an MHD avalanche might occur in a multithreaded coronal loop. Considering 23 non-potential magnetic threads within a loop, we use 3D MHD simulations to show that only one thread needs to be unstable in order to start an avalanche even when the others are below marginal stability. This has significant implications for coronal heating in that it provides for energy dissipation with a trigger mechanism. The instability of the unstable thread follows the evolution determined in many earlier investigations. However, once one stable thread is disrupted, it coalesces with a neighboring thread andmore » this process disrupts other nearby threads. Coalescence with these disrupted threads then occurs leading to the disruption of yet more threads as the avalanche develops. Magnetic energy is released in discrete bursts as the surrounding stable threads are disrupted. The volume integrated heating, as a function of time, shows short spikes suggesting that the temporal form of the heating is more like that of nanoflares than of constant heating.« less

  2. GPU COMPUTING FOR PARTICLE TRACKING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

    2011-03-25

    This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less

  3. The measure method of internal screw thread and the measure device design

    NASA Astrophysics Data System (ADS)

    Hu, Dachao; Chen, Jianguo

    2008-12-01

    In accordance with the principle of Three-Line, this paper analyzed the correlation of every main parameter of internal screw thread, and then designed a device to measure the main parameters of internal screw thread. Basis on the measured value and corresponding formula calculation, we can get the internal thread parameters, such as the pitch diameter, thread angle and screw-pitch of common screw thread, terraced screw thread, zigzag screw thread and some else. The practical application has proved that this operation of this device is convenience, and the measured dates have a high accuracy. Meanwhile, the application of this device's patent of invention is accepted by the Patent Office. (The filing number: 200710044081.5)

  4. Cable-type supercapacitors of three-dimensional cotton thread based multi-grade nanostructures for wearable energy storage.

    PubMed

    Liu, Nishuang; Ma, Wenzhen; Tao, Jiayou; Zhang, Xianghui; Su, Jun; Li, Luying; Yang, Congxing; Gao, Yihua; Golberg, Dmitri; Bando, Yoshio

    2013-09-20

    A novel cable-type flexible supercapacitor with excellent performance is fabricated using 3D polypyrrole(PPy)-MnO2 -CNT-cotton thread multi-grade nanostructure-based electrodes. The multiple supercapacitors with a high areal capacitance 1.49 F cm(-2) at a scan rate of 1 mV s(-1) connected in series and in parallel can successfully drive a LED segment display. Such an excellent performance is attributed to the cumulative effect of conducting single-walled carbon nanotubes on cotton thread, active mesoporous flower-like MnO2 nanoplates, and PPy conductive wrapping layer improving the conductivity, and acting as pseudocapacitance material simultaneously. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke; Wu, Kesheng; Bethel, E. Wes

    2009-06-02

    The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less

  6. Optimal Configuration and Deployment of Software on Multi-Core Processing Architectures

    DTIC Science & Technology

    2008-07-01

    between the event generating threads and the collector thread is implemented through semaphores . The Perseus data logger is designed to minimize the...performance counters (through the PAPI API) and opens up access to the shared memory logger through a semaphore and Remote Procedure Call (RPC) buffer... synchronization events. Using this rich data, the TMAM is able to output all of the information necessary to identify precisely which pairs of thread

  7. Benchmarking high performance computing architectures with CMS’ skeleton framework

    NASA Astrophysics Data System (ADS)

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-10-01

    In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.

  8. WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mendygral, P. J.; Radcliffe, N.; Kandalla, K.

    2017-02-01

    We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less

  9. A Multi-Threaded Cryptographic Pseudorandom Number Generator Test Suite

    DTIC Science & Technology

    2016-09-01

    bitcoin thieves, Google releases patch. (2013, Aug. 16). SiliconANGLE. [Online]. Available: http://siliconangle.com/blog/2013/ 08/16/android-crypto-prng...flaw-aided- bitcoin -thieves-google-releases-patch/ [5] M. Gondree. (2014, Sep. 28). NPS POSIX thread pool library. [Online]. Available: https

  10. Shared prefetching to reduce execution skew in multi-threaded systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eichenberger, Alexandre E; Gunnels, John A

    Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less

  11. A C++ Thread Package for Concurrent and Parallel Programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jie Chen; William Watson

    1999-11-01

    Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.

    Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whethermore » global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.« less

  13. Development of an extensible dual-core wireless sensing node for cyber-physical systems

    NASA Astrophysics Data System (ADS)

    Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.

    2014-04-01

    The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.

  14. Scaling Irregular Applications through Data Aggregation and Software Multithreading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel

    Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less

  15. FODEM: A Multi-Threaded Research and Development Method for Educational Technology

    ERIC Educational Resources Information Center

    Suhonen, Jarkko; de Villiers, M. Ruth; Sutinen, Erkki

    2012-01-01

    Formative development method (FODEM) is a multithreaded design approach that was originated to support the design and development of various types of educational technology innovations, such as learning tools, and online study programmes. The threaded and agile structure of the approach provides flexibility to the design process. Intensive…

  16. Composition and substrate-dependent strength of the silken attachment discs in spiders

    PubMed Central

    Grawe, Ingo; Wolff, Jonas O.; Gorb, Stanislav N.

    2014-01-01

    Araneomorph spiders have evolved different silks with dissimilar material properties, serving different purposes. The two-compound pyriform secretion is used to glue silk threads to substrates or to other threads. It is applied in distinct patterns, called attachment discs. Although ubiquitously found in spider silk applications and hypothesized to be strong and versatile at low material consumption, the performance of attachment discs on different substrates remains unknown. Here, we analyse the detachment forces and fracture mechanics of the attachment discs spun by five different species on three different substrates, by pulling on the upstream part of the attached thread. Results show that although the adhesion of the pyriform glue is heavily affected by the substrate, even on Teflon it is frequently strong enough to hold the spider's weight. As plant surfaces are often difficult to wet, they are hypothesized to be the major driving force for evolution of the pyriform secretion. PMID:25030386

  17. Ropes: Support for collective opertions among distributed threads

    NASA Technical Reports Server (NTRS)

    Haines, Matthew; Mehrotra, Piyush; Cronk, David

    1995-01-01

    Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. Recently, systems have been designed and implemented to support interprocessor communication between lightweight threads so that threads can be exploited in a distributed memory system. Their use, in this setting, has been largely restricted to supporting latency hiding techniques and functional parallelism within a single application. However, to execute data parallel codes independent of other threads in the system, collective operations and relative indexing among threads are required. This paper describes the design of ropes: a scoping mechanism for collective operations and relative indexing among threads. We present the design of ropes in the context of the Chant system, and provide performance results evaluating our initial design decisions.

  18. Flare particle acceleration in the interaction of twisted coronal flux ropes

    NASA Astrophysics Data System (ADS)

    Threlfall, J.; Hood, A. W.; Browning, P. K.

    2018-03-01

    Aim. The aim of this work is to investigate and characterise non-thermal particle behaviour in a three-dimensional (3D) magnetohydrodynamical (MHD) model of unstable multi-threaded flaring coronal loops. Methods: We have used a numerical scheme which solves the relativistic guiding centre approximation to study the motion of electrons and protons. The scheme uses snapshots from high resolution numerical MHD simulations of coronal loops containing two threads, where a single thread becomes unstable and (in one case) destabilises and merges with an additional thread. Results: The particle responses to the reconnection and fragmentation in MHD simulations of two loop threads are examined in detail. We illustrate the role played by uniform background resistivity and distinguish this from the role of anomalous resistivity using orbits in an MHD simulation where only one thread becomes unstable without destabilising further loop threads. We examine the (scalable) orbit energy gains and final positions recovered at different stages of a second MHD simulation wherein a secondary loop thread is destabilised by (and merges with) the first thread. We compare these results with other theoretical particle acceleration models in the context of observed energetic particle populations during solar flares.

  19. Benchmarking high performance computing architectures with CMS’ skeleton framework

    DOE PAGES

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-11-23

    Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less

  20. Benchmarking high performance computing architectures with CMS’ skeleton framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less

  1. List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

    NASA Astrophysics Data System (ADS)

    Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

    2014-03-01

    List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.

  2. Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

    NASA Astrophysics Data System (ADS)

    Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

    2017-05-01

    In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.

  3. Electronic and optical properties of GaN/AlN quantum dots with adjacent threading dislocations

    NASA Astrophysics Data System (ADS)

    Ye, Han; Lu, Peng-Fei; Yu, Zhong-Yuan; Yao, Wen-Jie; Chen, Zhi-Hui; Jia, Bo-Yong; Liu, Yu-Min

    2010-04-01

    We present a theory to simulate a coherent GaN QD with an adjacent pure edge threading dislocation by using a finite element method. The piezoelectric effects and the strain modified band edges are investigated in the framework of multi-band k · p theory to calculate the electron and the heavy hole energy levels. The linear optical absorption coefficients corresponding to the interband ground state transition are obtained via the density matrix approach and perturbation expansion method. The results indicate that the strain distribution of the threading dislocation affects the electronic structure. Moreover, the ground state transition behaviour is also influenced by the position of the adjacent threading dislocation.

  4. Online discussion groups for bulimia nervosa: an inductive approach to Internet-based communication between patients.

    PubMed

    Wesemann, Dorette; Grunwald, Martin

    2008-09-01

    Online discussion forums are often used by people with eating disorders. This study analyses 2,072 threads containing a total of 14,903 postings from an unmoderated German "prorecovery" forum for persons suffering from bulimia nervosa (www.ab-server.de) during the period from October 2004 to May 2006. The threads were inductively analyzed for underlying structural types, and the various types found were then analyzed for differences in temporal and quantitative parameters. Communication in the online discussion forum occurred in three types of thread: (1) problem-oriented threads (78.8% of threads), (2) communication-oriented threads (15.3% of threads), and (3) metacommunication threads (2.6% of threads). Metacommunication threads contained significantly more postings than problem-oriented and communication-oriented threads, and they were viewed significantly more often. Moreover, there are temporal differences between the structural types. Topics relating to active management of the disorder receive great attention in prorecovery forums. (c) 2008 by Wiley Periodicals, Inc.

  5. Kernel optimization for short-range molecular dynamics

    NASA Astrophysics Data System (ADS)

    Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He

    2017-02-01

    To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.

  6. Deviation of the typical AAA substrate-threading pore prevents fatal protein degradation in yeast Cdc48.

    PubMed

    Esaki, Masatoshi; Islam, Md Tanvir; Tani, Naoki; Ogura, Teru

    2017-07-14

    Yeast Cdc48 is a well-conserved, essential chaperone of ATPases associated with diverse cellular activity (AAA) proteins, which recognizes substrate proteins and modulates their conformations to carry out many cellular processes. However, the fundamental mechanisms underlying the diverse pivotal roles of Cdc48 remain unknown. Almost all AAA proteins form a ring-shaped structure with a conserved aromatic amino acid residue that is essential for proper function. The threading mechanism hypothesis suggests that this residue guides the intrusion of substrate proteins into a narrow pore of the AAA ring, thereby becoming unfolded. By contrast, the aromatic residue in one of the two AAA rings of Cdc48 has been eliminated through evolution. Here, we show that artificial retrieval of this aromatic residue in Cdc48 is lethal, and essential features to support the threading mechanism are required to exhibit the lethal phenotype. In particular, genetic and biochemical analyses of the Cdc48 lethal mutant strongly suggested that when in complex with the 20S proteasome, essential proteins are abnormally forced to thread through the Cdc48 pore to become degraded, which was not detected in wild-type Cdc48. Thus, the widely applicable threading model is less effective for wild-type Cdc48; rather, Cdc48 might function predominantly through an as-yet-undetermined mechanism.

  7. Exploration of microfluidic devices based on multi-filament threads and textiles: A review

    PubMed Central

    Nilghaz, A.; Ballerini, D. R.; Shen, W.

    2013-01-01

    In this paper, we review the recent progress in the development of low-cost microfluidic devices based on multifilament threads and textiles for semi-quantitative diagnostic and environmental assays. Hydrophilic multifilament threads are capable of transporting aqueous and non-aqueous fluids via capillary action and possess desirable properties for building fluid transport pathways in microfluidic devices. Thread can be sewn onto various support materials to form fluid transport channels without the need for the patterned hydrophobic barriers essential for paper-based microfluidic devices. Thread can also be used to manufacture fabrics which can be patterned to achieve suitable hydrophilic-hydrophobic contrast, creating hydrophilic channels which allow the control of fluids flow. Furthermore, well established textile patterning methods and combination of hydrophilic and hydrophobic threads can be applied to fabricate low-cost microfluidic devices that meet the low-cost and low-volume requirements. In this paper, we review the current limitations and shortcomings of multifilament thread and textile-based microfluidics, and the research efforts to date on the development of fluid flow control concepts and fabrication methods. We also present a summary of different methods for modelling the fluid capillary flow in microfluidic thread and textile-based systems. Finally, we summarized the published works of thread surface treatment methods and the potential of combining multifilament thread with other materials to construct devices with greater functionality. We believe these will be important research focuses of thread- and textile-based microfluidics in future. PMID:24086179

  8. FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks

    PubMed Central

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out. PMID:25602758

  9. FastGCN: a GPU accelerated tool for fast gene co-expression networks.

    PubMed

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.

  10. Historical perspectives on channel pattern in the Clark Fork River, Montana and implications for post-dam removal restoration

    NASA Astrophysics Data System (ADS)

    Woelfle-Erskine, C. A.; Wilcox, A. C.

    2009-12-01

    Active restoration approaches such as channel reconstruction have moved beyond the realm of small streams and are being applied to larger rivers. Uncertainties arising from limited knowledge, fluvial and ecosystem variability, and contaminants are especially significant in restoration of large rivers, where project costs and the social, infrastructural, and ecological costs of failure are high. We use the case of Milltown Dam removal on the Clark Fork River, Montana and subsequent channel reconstruction in the former reservoir to examine the use of historical research and uncertainty analysis in river restoration. At a cost of approximately $120 million, the Milltown Dam removal involves the mechanical removal of approximately 2 million cubic meters of sediments contaminated by upstream mining, followed by restoration of the former reservoir reach in which a single-thread meandering channel is being constructed. Historical maps, surveys, photographs, and accounts suggest a conceptual model of a multi-thread, anastomosing river in the reach targeted for channel reconstruction, upstream of the confluence of the Clark Fork and Blackfoot Rivers. We supplemented historical research with analysis of aerial photographs, topographic data, and USGS stage-discharge measurements in a lotic but reservoir-influenced reach of the Clark Fork River within our study area to estimate avulsion frequency (0.8 avulsions/year over a 70-year period) and average rates of lateral migration and aggradation. These were used to calculate the mobility number, a dimensionless relationship between channel filling and lateral migration timescales that can be used to predict whether a river’s planform is single or multi-threaded. The mobility number within our study reach ranged from 0.6 (multi-thread channel) to 1.7 (transitional channel). We predict that, in the absence of active channel reconstruction, the post-dam channel pattern would evolve to one that alternates between single and multi-threaded. We propose that multiple working hypotheses should be applied to managing uncertainty as part of an adaptive management plan for restoration in our study area and elsewhere. In this approach, restoration planning and implementation would be underpinned by an explicitly identified set of uncertainties and hypotheses about channel processes and post-restoration responses. This framework would allow for and embrace channel processes such as bifurcations and avulsions that are excluded from dominant approaches to channel reconstruction, which emphasize single-thread meandering planforms.

  11. WaveJava: Wavelet-based network computing

    NASA Astrophysics Data System (ADS)

    Ma, Kun; Jiao, Licheng; Shi, Zhuoer

    1997-04-01

    Wavelet is a powerful theory, but its successful application still needs suitable programming tools. Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi- threaded, dynamic language. This paper addresses the design and development of a cross-platform software environment for experimenting and applying wavelet theory. WaveJava, a wavelet class library designed by the object-orient programming, is developed to take advantage of the wavelets features, such as multi-resolution analysis and parallel processing in the networking computing. A new application architecture is designed for the net-wide distributed client-server environment. The data are transmitted with multi-resolution packets. At the distributed sites around the net, these data packets are done the matching or recognition processing in parallel. The results are fed back to determine the next operation. So, the more robust results can be arrived quickly. The WaveJava is easy to use and expand for special application. This paper gives a solution for the distributed fingerprint information processing system. It also fits for some other net-base multimedia information processing, such as network library, remote teaching and filmless picture archiving and communications.

  12. Multi-petascale highly efficient parallel supercomputer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less

  13. Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Villa, Oreste; Tumeo, Antonino; Secchi, Simone

    Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less

  14. Parallel Task Management Library for MARTe

    NASA Astrophysics Data System (ADS)

    Valcarcel, Daniel F.; Alves, Diogo; Neto, Andre; Reux, Cedric; Carvalho, Bernardo B.; Felton, Robert; Lomas, Peter J.; Sousa, Jorge; Zabeo, Luca

    2014-06-01

    The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows modular code to run in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the Joint European Torus (JET) tokamak WAll Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles and plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase in the number of available CPU cores. The WorkLibrary introduces a “GPU-like approach” of splitting work among the available cores of modern CPUs that is (i) straightforward to use in an application, (ii) scalable with the availability of cores and all of this (iii) without rewriting or recompiling the source code. The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.

  15. Too Much Control Can Hurt: A Threaded Cognition Model of the Attentional Blink

    ERIC Educational Resources Information Center

    Taatgen, Niels A.; Juvina, Ion; Schipper, Marc; Borst, Jelmer P.; Martens, Sander

    2009-01-01

    Explanations for the attentional blink (AB; a deficit in identifying the second of two targets when presented 200-500ms after the first) have recently shifted from limitations in memory consolidation to disruptions in cognitive control. With a new model based on the threaded cognition theory of multi-tasking we propose a different explanation: the…

  16. Using the CMS threaded framework in a production environment

    DOE PAGES

    Jones, C. D.; Contreras, L.; Gartung, P.; ...

    2015-12-23

    During 2014, the CMS Offline and Computing Organization completed the necessary changes to use the CMS threaded framework in the full production environment. We will briefly discuss the design of the CMS Threaded Framework, in particular how the design affects scaling performance. We will then cover the effort involved in getting both the CMSSW application software and the workflow management system ready for using multiple threads for production. Finally, we will present metrics on the performance of the application and workflow system as well as the difficulties which were uncovered. As a result, we will end with CMS' plans formore » using the threaded framework to do production for LHC Run 2.« less

  17. Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sarje, Abhinav; Jacobsen, Douglas W.; Williams, Samuel W.

    The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.

  18. Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

    NASA Astrophysics Data System (ADS)

    Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

    2017-07-01

    Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

  19. Collagen insulated from tensile damage by domains that unfold reversibly: in situ X-ray investigation of mechanical yield and damage repair in the mussel byssus

    PubMed Central

    Harrington, Matthew J.; Gupta, Himadri S.; Fratzl, Peter; Waite, J. Herbert

    2009-01-01

    The byssal threads of the California mussel, Mytilus californianus, are highly hysteretic, elastomeric fibers that collectively perform a holdfast function in wave-swept rocky seashore habitats. Following cyclic loading past the mechanical yield point, threads exhibit a damage-dependent reduction in mechanical performance. However, the distal portion of the byssal thread is capable of recovering initial material properties through a time-dependent healing process in the absence of active cellular metabolism. Byssal threads are composed almost exclusively of multi-domain hybrid collagens known as preCols, which largely determine the mechanical properties of the thread. Here, the structure-property relationships that govern thread mechanical performance are further probed. The molecular rearrangements that occur during yield and damage repair were investigated using time-resolved in situ wide angle X-ray diffraction (WAXD) coupled with cyclic tensile loading of threads and through thermally enhanced damage-repair studies. Results indicate that the collagen domains in byssal preCols are mechanically protected by the unfolding of sacrificial non-collagenous domains that refold on a slower time-scale. Time-dependent healing is primarily attributed to stochastic recoupling of broken histidine-metal coordination complexes. PMID:19275941

  20. Influence of thread shape and inclination on the biomechanical behaviour of plateau implant systems.

    PubMed

    Calì, Michele; Zanetti, Elisabetta Maria; Oliveri, Salvatore Massimo; Asero, Riccardo; Ciaramella, Stefano; Martorelli, Massimo; Bignardi, Cristina

    2018-03-01

    To assess the influence of implant thread shape and inclination on the mechanical behaviour of bone-implant systems. The study assesses which factors influence the initial and full osseointegration stages. Point clouds of the original implant were created using a non-contact reverse engineering technique. A 3D tessellated surface was created using Geomagic Studio ® software. From cross-section curves, generated by intersecting the tessellated model and cutting-planes, a 3D parametric CAD model was created using SolidWorks ® 2017. By the permutation of three thread shapes (rectangular, 30° trapezoidal, 45° trapezoidal) and three thread inclinations (0°, 3° or 6°), nine geometric configurations were obtained. Two different osseointegration stages were analysed: the initial osseointegration and a full osseointegration. In total, 18 different FE models were analysed and two load conditions were applied to each model. The mechanical behaviour of the models was analysed by Finite Element (FE) Analysis using ANSYS ® v. 17.0. Static linear analyses were also carried out. ANOVA was used to assess the influence of each factor. Models with a rectangular thread and 6° inclination provided the best results and reduced displacement in the initial osseointegration stages up to 4.58%. This configuration also reduced equivalent VM stress peaks up to 54%. The same effect was confirmed for the full osseointegration stage, where 6° inclination reduced stress peaks by up to 62%. The FE analysis confirmed the beneficial effect of thread inclination, reducing the displacement in immediate post-operative conditions and equivalent VM stress peaks. Thread shape does not significantly influence the mechanical behaviour of bone-implant systems but contributes to reducing stress peaks in the trabecular bone in both the initial and full osseointegration stages. Copyright © 2018 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  1. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory

  2. Thread-Mounted Thermocouple

    NASA Technical Reports Server (NTRS)

    Ward, Stanley W.

    1988-01-01

    Thread-mounted thermocouple developed to accurately measure temperature of surrounding material. Comprised of threaded rod or bolt drilled along length, dual-hole ceramic insulator rod, thermocouple wire, optional ceramic filler, and epoxy resin. In contact with and takes average temperature of, surrounding material. Fabricated easily in size and metal to suit particular application. Because of simplicity and ability to measure average temperature, widespread use of design foreseen in varity of applications.

  3. SMT-Aware Instantaneous Footprint Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roy, Probir; Liu, Xu; Song, Shuaiwen

    Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.

  4. Optimized FPGA Implementation of Multi-Rate FIR Filters Through Thread Decomposition

    NASA Technical Reports Server (NTRS)

    Kobayashi, Kayla N.; He, Yutao; Zheng, Jason X.

    2011-01-01

    Multi-rate finite impulse response (MRFIR) filters are among the essential signal-processing components in spaceborne instruments where finite impulse response filters are often used to minimize nonlinear group delay and finite precision effects. Cascaded (multistage) designs of MRFIR filters are further used for large rate change ratio in order to lower the required throughput, while simultaneously achieving comparable or better performance than single-stage designs. Traditional representation and implementation of MRFIR employ polyphase decomposition of the original filter structure, whose main purpose is to compute only the needed output at the lowest possible sampling rate. In this innovation, an alternative representation and implementation technique called TD-MRFIR (Thread Decomposition MRFIR) is presented. The basic idea is to decompose MRFIR into output computational threads, in contrast to a structural decomposition of the original filter as done in the polyphase decomposition. A naive implementation of a decimation filter consisting of a full FIR followed by a downsampling stage is very inefficient, as most of the computations performed by the FIR state are discarded through downsampling. In fact, only 1/M of the total computations are useful (M being the decimation factor). Polyphase decomposition provides an alternative view of decimation filters, where the downsampling occurs before the FIR stage, and the outputs are viewed as the sum of M sub-filters with length of N/M taps. Although this approach leads to more efficient filter designs, in general the implementation is not straightforward if the numbers of multipliers need to be minimized. In TD-MRFIR, each thread represents an instance of the finite convolution required to produce a single output of the MRFIR. The filter is thus viewed as a finite collection of concurrent threads. Each of the threads completes when a convolution result (filter output value) is computed, and activated when the first input of the convolution becomes available. Thus, the new threads get spawned at exactly the rate of N/M, where N is the total number of taps, and M is the decimation factor. Existing threads retire at the same rate of N/M. The implementation of an MRFIR is thus transformed into a problem to statically schedule the minimum number of multipliers such that all threads can be completed on time. Solving the static scheduling problem is rather straightforward if one examines the Thread Decomposition Diagram, which is a table-like diagram that has rows representing computation threads and columns representing time. The control logic of the MRFIR can be implemented using simple counters. Instead of decomposing MRFIRs into subfilters as suggested by polyphase decomposition, the thread decomposition diagrams transform the problem into a familiar one of static scheduling, which can be easily solved as the input rate is constant.

  5. Effects of thread interruptions on tool pins in friction stir welding of AA6061

    DOE PAGES

    Reza-E-Rabby, Md.; Tang, Wei; Reynolds, Anthony P.

    2017-06-21

    In this paper, effects of pin thread and thread interruptions (flats) on weld quality and process response parameters during friction stir welding (FSW) of 6061 aluminium alloy were quantified. Otherwise, identical smooth and threaded pins with zero to four flats were adopted for FSW. Weldability and process response variables were examined. Results showed that threads with flats significantly improved weld quality and reduced in-plane forces. A three-flat threaded pin led to production of defect-free welds under all examined welding conditions. Spectral analyses of in-plane forces and weld cross-sectional analysis were performed to establish correlation among pin flats, force dynamics andmore » defect formation. Finally, the lowest in-plane force spectra amplitudes were consistently observed for defect-free welds.« less

  6. Effects of thread interruptions on tool pins in friction stir welding of AA6061

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reza-E-Rabby, Md.; Tang, Wei; Reynolds, Anthony P.

    In this paper, effects of pin thread and thread interruptions (flats) on weld quality and process response parameters during friction stir welding (FSW) of 6061 aluminium alloy were quantified. Otherwise, identical smooth and threaded pins with zero to four flats were adopted for FSW. Weldability and process response variables were examined. Results showed that threads with flats significantly improved weld quality and reduced in-plane forces. A three-flat threaded pin led to production of defect-free welds under all examined welding conditions. Spectral analyses of in-plane forces and weld cross-sectional analysis were performed to establish correlation among pin flats, force dynamics andmore » defect formation. Finally, the lowest in-plane force spectra amplitudes were consistently observed for defect-free welds.« less

  7. Memory Benchmarks for SMP-Based High Performance Parallel Computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoo, A B; de Supinski, B; Mueller, F

    2001-11-20

    As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less

  8. Self-locking threaded fasteners

    DOEpatents

    Glovan, Ronald J.; Tierney, John C.; McLean, Leroy L.; Johnson, Lawrence L.

    1996-01-01

    A threaded fastener with a shape memory alloy (SMA) coatings on its threads is disclosed. The fastener has special usefulness in high temperature applications where high reliability is important. The SMA coated fastener is threaded into or onto a mating threaded part at room temperature to produce a fastened object. The SMA coating is distorted during the assembly. At elevated temperatures the coating tries to recover its original shape and thereby exerts locking forces on the threads. When the fastened object is returned to room temperature the locking forces dissipate. Consequently the threaded fasteners can be readily disassembled at room temperature but remains securely fastened at high temperatures. A spray technique is disclosed as a particularly useful method of coating of threads of a fastener with a shape memory alloy.

  9. Application of Intel Many Integrated Core (MIC) architecture to the Yonsei University planetary boundary layer scheme in Weather Research and Forecasting model

    NASA Astrophysics Data System (ADS)

    Huang, Melin; Huang, Bormin; Huang, Allen H.

    2014-10-01

    The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

  10. Image-based 3D reconstruction and virtual environmental walk-through

    NASA Astrophysics Data System (ADS)

    Sun, Jifeng; Fang, Lixiong; Luo, Ying

    2001-09-01

    We present a 3D reconstruction method, which combines geometry-based modeling, image-based modeling and rendering techniques. The first component is an interactive geometry modeling method which recovery of the basic geometry of the photographed scene. The second component is model-based stereo algorithm. We discus the image processing problems and algorithms of walking through in virtual space, then designs and implement a high performance multi-thread wandering algorithm. The applications range from architectural planning and archaeological reconstruction to virtual environments and cinematic special effects.

  11. Real time display Fourier-domain OCT using multi-thread parallel computing with data vectorization

    NASA Astrophysics Data System (ADS)

    Eom, Tae Joong; Kim, Hoon Seop; Kim, Chul Min; Lee, Yeung Lak; Choi, Eun-Seo

    2011-03-01

    We demonstrate a real-time display of processed OCT images using multi-thread parallel computing with a quad-core CPU of a personal computer. The data of each A-line are treated as one vector to maximize the data translation rate between the cores of the CPU and RAM stored image data. A display rate of 29.9 frames/sec for processed OCT data (4096 FFT-size x 500 A-scans) is achieved in our system using a wavelength swept source with 52-kHz swept frequency. The data processing times of the OCT image and a Doppler OCT image with a 4-time average are 23.8 msec and 91.4 msec.

  12. Self-locking threaded fasteners

    DOEpatents

    Glovan, R.J.; Tierney, J.C.; McLean, L.L.; Johnson, L.L.

    1996-01-16

    A threaded fastener with a shape memory alloy (SMA) coatings on its threads is disclosed. The fastener has special usefulness in high temperature applications where high reliability is important. The SMA coated fastener is threaded into or onto a mating threaded part at room temperature to produce a fastened object. The SMA coating is distorted during the assembly. At elevated temperatures the coating tries to recover its original shape and thereby exerts locking forces on the threads. When the fastened object is returned to room temperature the locking forces dissipate. Consequently the threaded fasteners can be readily disassembled at room temperature but remains securely fastened at high temperatures. A spray technique is disclosed as a particularly useful method of coating of threads of a fastener with a shape memory alloy. 13 figs.

  13. A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.

    PubMed

    Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong

    2015-12-01

    SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.

  14. FR/HR Sewing Thread

    DTIC Science & Technology

    2015-09-01

    position unless so designated by other authorized documents. Citation of trade names in this report does not constitute an official endorsement or...project to design and develop a Fire Resistant (FR) and Heat Resistant (HR) sewing thread. The main goal of the project is to produce sewing threads made...addresses the design , development and testing of various Fire Resistant (FR)/Heat Resistant (HR) sewing threads for US Army applications. Such a sewing

  15. Hydraulic conditions of flood flows in a Polish Carpathian river subjected to variable human impacts

    NASA Astrophysics Data System (ADS)

    Radecki-Pawlik, Artur; Czech, Wiktoria; Wyżga, Bartłomiej; Mikuś, Paweł; Zawiejska, Joanna; Ruiz-Villanueva, Virginia

    2016-04-01

    Channel morphology of the Czarny Dunajec River, Polish Carpathians, has been considerably modified as a result of channelization and gravel-mining induced channel incision, and now it varies from a single-thread, incised or regulated channel to an unmanaged, multi-thread channel. We investigated effects of these distinct channel morphologies on the conditions for flood flows in a study of 25 cross-sections from the middle river course where the Czarny Dunajec receives no significant tributaries and flood discharges increase little in the downstream direction. Cross-sectional morphology, channel slope and roughness of particular cross-section parts were used as input data for the hydraulic modelling performed with the 1D steady-flow HEC-RAS model for discharges with recurrence interval from 1.5 to 50 years. The model for each cross-section was calibrated with the water level of a 20-year flood from May 2014, determined shortly after the flood on the basis of high-water marks. Results indicated that incised and channelized river reaches are typified by similar flow widths and cross-sectional flow areas, which are substantially smaller than those in the multi-thread reach. However, because of steeper channel slope in the incised reach than in the channelized reach, the three river reaches differ in unit stream power and bed shear stress, which attain the highest values in the incised reach, intermediate values in the channelized reach, and the lowest ones in the multi-thread reach. These patterns of flow power and hydraulic forces are reflected in significant differences in river competence between the three river reaches. Since the introduction of the channelization scheme 30 years ago, sedimentation has reduced its initial flow conveyance by more than half and elevated water stages at given flood discharges by about 0.5-0.7 m. This partly reflects a progressive growth of natural levees along artificially stabilized channel banks. By contrast, sediments of natural levees deposited along the multi-thread channel and subsequently eroded in the course of lateral channel migration and floodplain reworking; as a result, they do not reduce the conveyance of floodplain flows in this reach. This study was performed within the scope of the Research Project DEC-2013/09/B/ST10/00056 financed by the National Science Centre of Poland.

  16. Interactions among forest age, valley and channel morphology, and log jams regulate animal production in mountain streams

    NASA Astrophysics Data System (ADS)

    Walters, D. M.; Venarsky, M. P.; Hall, R. O., Jr.; Herdrich, A.; Livers, B.; Winkelman, D.; Wohl, E.

    2014-12-01

    Forest age and local valley morphometry strongly influence the form and function of mountain streams in Colorado. Streams in valleys with old growth forest (>350 years) have extensive log jam complexes that create multi-thread channel reaches with extensive pool habitat and large depositional areas. Streams in younger unmanaged forests (e.g., 120 years old) and intensively managed forests have much fewer log jams and lower wood loads. These are single-thread streams dominated by riffles and with little depositional habitat. We hypothesized that log jam streams would retain more organic matter and have higher metabolism, leading to greater production of stream macroinvertebrates and trout. Log jam reaches should also have greater emergence of adult aquatic insects, and consequently have higher densities of riparian spiders taking advantage of these prey. Surficial organic matter was 3-fold higher in old-growth streams, and these streams had much higher ecosystem respiration. Insect production (g m2 y-1) was similar among forest types, but fish density was four times higher in old-growth streams with copious log jams. However, at the valley scale, insect production (g m-1 valley-1) and trout density (number m-1 valley-1) was 2-fold and 10-fold higher, respectively, in old growth streams. This finding is because multi-thread reaches created by log jams have much greater stream area and stream length per meter of valley than single-thread channels. The more limited response of macroinvertebrates may be related to fish predation. Trout in old growth streams had similar growth rates and higher fat content than fish in other streams in spite of occurring at higher densities and higher elevation/colder temperatures. This suggests that the positive fish effect observed in old growth streams is related to greater availability of invertebrate prey, which is consistent with our original hypothesis. Preliminary analyses suggest that spider densities do not respond strongly to differences in stream morphology, but rather to changes in elevation and associated air temperatures. These results demonstrate strong indirect effects of forest age and valley morphometry on organic matter storage and animal secondary production in streams that is mediated by direct effects associated with the presence or absence of logjams.

  17. Model Checking with Multi-Threaded IC3 Portfolios

    DTIC Science & Technology

    2015-01-15

    different runs varies randomly depending on the thread interleaving. The use of a portfolio of solvers to maximize the likelihood of a quick solution is...empirically show (cf. Sec. 5.2) that the predictions based on this formula have high accuracy. Note that each solver in the portfolio potentially searches...speedup of over 300. We also show that widening the proof search of ic3 by randomizing its SAT solver is not as effective as paral- lelization

  18. River channel adjustments in Southern Italy over the past 150 years and implications for channel recovery

    NASA Astrophysics Data System (ADS)

    Scorpio, Vittoria; Aucelli, Pietro P. C.; Giano, Salvatore I.; Pisano, Luca; Robustelli, Gaetano; Rosskopf, Carmen M.; Schiattarella, Marcello

    2015-12-01

    Multi-temporal GIS analysis of topographic maps and aerial photographs along with topographic and geomorphological surveys are used to assess evolutionary trends and key control factors of channel adjustments for five major rivers in southern Italy (the Trigno, Biferno, Volturno, Sinni and Crati rivers) to support assessment of channel recovery and river restoration. Three distinct phases of channel adjustment are identified over the past 150 years primarily driven by human disturbances. Firstly, slight channel widening dominated from the last decades of the nineteenth century to the 1950s. Secondly, from the 1950s to the end of the 1990s, altered sediment fluxes induced by in-channel mining and channel works brought about moderate to very intense incision (up to 6-7 m) accompanied by strong channel narrowing (up to 96%) and changes in channel configuration from multi-threaded to single-threaded patterns. Thirdly, the period from around 2000 to 2015 has been characterized by channel stabilization and local widening. Evolutionary trajectories of the rivers studied are quite similar to those reconstructed for other Italian rivers, particularly regarding the second phase of channel adjustments and ongoing transitions towards channel recovery in some reaches. Analyses of river dynamics, recovery potential and connectivity with sediment sources of the study reaches, framed in their catchment context, can be used as part of a wider interdisciplinary approach that views effective river restoration alongside sustainable and risk-reduced river management.

  19. Development and Evaluation of Vectorised and Multi-Core Event Reconstruction Algorithms within the CMS Software Framework

    NASA Astrophysics Data System (ADS)

    Hauth, T.; Innocente and, V.; Piparo, D.

    2012-12-01

    The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.

  20. Study of Measurement Strategies of Geometric Deviation of the Position of the Threaded Holes

    NASA Astrophysics Data System (ADS)

    Drbul, Mário; Martikan, Pavol; Sajgalik, Michal; Czan, Andrej; Broncek, Jozef; Babik, Ondrej

    2017-12-01

    Verification of product and quality control is an integral part of current production process. In terms of functional requirements and product interoperability, it is necessary to analyze their dimensional and also geometric specifications. Threaded holes are verified elements too, which are a substantial part of detachable screw connections and have a broad presence in engineering products. This paper deals with on the analysing of measurement strategies of verification geometric deviation of the position of the threaded holes, which are the indirect method of measuring threaded pins when applying different measurement strategies which can affect the result of the verification of the product..

  1. On Designing Lightweight Threads for Substrate Software

    NASA Technical Reports Server (NTRS)

    Haines, Matthew

    1997-01-01

    Existing user-level thread packages employ a 'black box' design approach, where the implementation of the threads is hidden from the user. While this approach is often sufficient for application-level programmers, it hides critical design decisions that system-level programmers must be able to change in order to provide efficient service for high-level systems. By applying the principles of Open Implementation Analysis and Design, we construct a new user-level threads package that supports common thread abstractions and a well-defined meta-interface for altering the behavior of these abstractions. As a result, system-level programmers will have the advantages of using high-level thread abstractions without having to sacrifice performance, flexibility or portability.

  2. Understanding thread properties for red blood cell antigen assays: weak ABO blood typing.

    PubMed

    Nilghaz, Azadeh; Zhang, Liyuan; Li, Miaosi; Ballerini, David R; Shen, Wei

    2014-12-24

    "Thread-based microfluidics" research has so far focused on utilizing and manipulating the wicking properties of threads to form controllable microfluidic channels. In this study we aim to understand the separation properties of threads, which are important to their microfluidic detection applications for blood analysis. Confocal microscopy was utilized to investigate the effect of the microscale surface morphologies of fibers on the thread's separation efficiency of red blood cells. We demonstrated the remarkably different separation properties of threads made using silk and cotton fibers. Thread separation properties dominate the clarity of blood typing assays of the ABO groups and some of their weak subgroups (Ax and A3). The microfluidic thread-based analytical devices (μTADs) designed in this work were used to accurately type different blood samples, including 89 normal ABO and 6 weak A subgroups. By selecting thread with the right surface morphology, we were able to build μTADs capable of providing rapid and accurate typing of the weak blood groups with high clarity.

  3. Parallel satellite orbital situational problems solver for space missions design and control

    NASA Astrophysics Data System (ADS)

    Atanassov, Atanas Marinov

    2016-11-01

    Solving different scientific problems for space applications demands implementation of observations, measurements or realization of active experiments during time intervals in which specific geometric and physical conditions are fulfilled. The solving of situational problems for determination of these time intervals when the satellite instruments work optimally is a very important part of all activities on every stage of preparation and realization of space missions. The elaboration of universal, flexible and robust approach for situation analysis, which is easily portable toward new satellite missions, is significant for reduction of missions' preparation times and costs. Every situation problem could be based on one or more situation conditions. Simultaneously solving different kinds of situation problems based on different number and types of situational conditions, each one of them satisfied on different segments of satellite orbit requires irregular calculations. Three formal approaches are presented. First one is related to situation problems description that allows achieving flexibility in situation problem assembling and presentation in computer memory. The second formal approach is connected with developing of situation problem solver organized as processor that executes specific code for every particular situational condition. The third formal approach is related to solver parallelization utilizing threads and dynamic scheduling based on "pool of threads" abstraction and ensures a good load balance. The developed situation problems solver is intended for incorporation in the frames of multi-physics multi-satellite space mission's design and simulation tools.

  4. Final report on EURAMET.L-S21: `Supplementary comparison of parallel thread gauges'

    NASA Astrophysics Data System (ADS)

    Mudronja, Vedran; Šimunovic, Vedran; Acko, Bojan; Matus, Michael; Bánréti, Edit; István, Dicso; Thalmann, Rudolf; Lassila, Antti; Lillepea, Lauri; Bartolo Picotto, Gian; Bellotti, Roberto; Pometto, Marco; Ganioglu, Okhan; Meral, Ilker; Salgado, José Antonio; Georges, Vailleau

    2015-01-01

    The results of the comparison of parallel thread gauges between ten European countries are presented. Three thread plugs and three thread rings were calibrated in one loop. Croatian National Laboratory for Length (HMI/FSB-LPMD) acted as the coordinator and pilot laboratory of the comparison. Thread angle, thread pitch, simple pitch diameter and pitch diameter were measured. Pitch diameters were calibrated within 1a, 2a, 1b and 2b calibration categories in accordance with the EURAMET cg-10 calibration guide. A good agreement between the measurement results and differences due to different calibration categories are analysed in this paper. This comparison was a first EURAMET comparison of parallel thread gauges based on the EURAMET ctg-10 calibration guide, and has made a step towards the harmonization of future comparisons with the registration of CMC values for thread gauges. Main text. To reach the main text of this paper, click on Final Report. Note that this text is that which appears in Appendix B of the BIPM key comparison database kcdb.bipm.org/. The final report has been peer-reviewed and approved for publication by the CCL, according to the provisions of the CIPM Mutual Recognition Arrangement (CIPM MRA).

  5. 49 CFR 179.300-13 - Venting, loading and unloading valves.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED) SPECIFICATIONS FOR TANK CARS Specifications for Multi-Unit Tank Car Tanks (Classes DOT-106A and 110AW) § 179.300-13 Venting... (h)(3)(ii). Threads for the clean-out/inspection ports of DOT Specification 110A multi-unit tank car...

  6. Bioglass incorporation improves mechanical properties and enhances cell-mediated mineralization on electrochemically aligned collagen threads.

    PubMed

    Nijsure, Madhura P; Pastakia, Meet; Spano, Joseph; Fenn, Michael B; Kishore, Vipuil

    2017-09-01

    Bone tissue engineering mandates the development of a functional scaffold that mimics the physicochemical properties of native bone. Bioglass 45S5 (BG) is a highly bioactive material known to augment bone formation and restoration. Hybrid scaffolds fabricated using collagen type I and BG resemble the organic and inorganic composition of the bone extracellular matrix and hence have been extensively investigated for bone tissue engineering applications. However, collagen-BG scaffolds developed thus far do not recapitulate the aligned structure of collagen found in native bone. In this study, an electrochemical fabrication method was employed to synthesize BG-incorporated electrochemically aligned collagen (BG-ELAC) threads that are compositionally similar to native bone. Further, aligned collagen fibrils within BG-ELAC threads mimic the anisotropic arrangement of collagen fibrils in native bone. The effect of BG incorporation on the mechanical properties and cell-mediated mineralization on ELAC threads was investigated. The results indicated that BG can be successfully incorporated within ELAC threads, without disturbing collagen fibril alignment. Further, BG incorporation significantly increased the ultimate tensile stress (UTS) and modulus of ELAC threads (p < 0.05). SBF conditioning showed extensive mineralization on BG-ELAC threads that increased over time demonstrating the bone bioactivity of BG-ELAC threads. Additionally, BG incorporation into ELAC threads resulted in increased cell proliferation (p < 0.05) and deposition of a highly dense and continuous mineralized matrix. In conclusion, incorporation of BG into ELAC threads is a viable strategy for the development of an osteoconductive material for bone tissue engineering applications. © 2017 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 105A: 2429-2440, 2017. © 2017 Wiley Periodicals, Inc.

  7. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  8. Distributed run of a one-dimensional model in a regional application using SOAP-based web services

    NASA Astrophysics Data System (ADS)

    Smiatek, Gerhard

    This article describes the setup of a distributed computing system in Perl. It facilitates the parallel run of a one-dimensional environmental model on a number of simple network PC hosts. The system uses Simple Object Access Protocol (SOAP) driven web services offering the model run on remote hosts and a multi-thread environment distributing the work and accessing the web services. Its application is demonstrated in a regional run of a process-oriented biogenic emission model for the area of Germany. Within a network consisting of up to seven web services implemented on Linux and MS-Windows hosts, a performance increase of approximately 400% has been reached compared to a model run on the fastest single host.

  9. Web-based access to near real-time and archived high-density time-series data: cyber infrastructure challenges & developments in the open-source Waveform Server

    NASA Astrophysics Data System (ADS)

    Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.

    2010-12-01

    The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.

  10. Development of Thread-compatible Open Source Stack

    NASA Astrophysics Data System (ADS)

    Zimmermann, Lukas; Mars, Nidhal; Schappacher, Manuel; Sikora, Axel

    2017-07-01

    The Thread protocol is a recent development based on 6LoWPAN (IPv6 over IEEE 802.15.4), but with extensions regarding a more media independent approach, which - additionally - also promises true interoperability. To evaluate and analyse the operation of a Thread network a given open source 6LoWPAN stack for embedded devices (emb::6) has been extended in order to comply with the Thread specification. The implementation covers Mesh Link Establishment (MLE) and network layer functionality as well as 6LoWPAN mesh under routing mechanism based on MAC short addresses. The development has been verified on a virtualization platform and allows dynamical establishment of network topologies based on Thread’s partitioning algorithm.

  11. Implementation and Optimization of miniGMG - a Compact Geometric Multigrid Benchmark

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Samuel; Kalamkar, Dhiraj; Singh, Amik

    2012-12-01

    Multigrid methods are widely used to accelerate the convergence of iterative solvers for linear systems used in a number of different application areas. In this report, we describe miniGMG, our compact geometric multigrid benchmark designed to proxy the multigrid solves found in AMR applications. We explore optimization techniques for geometric multigrid on existing and emerging multicore systems including the Opteron-based Cray XE6, Intel Sandy Bridge and Nehalem-based Infiniband clusters, as well as manycore-based architectures including NVIDIA's Fermi and Kepler GPUs and Intel's Knights Corner (KNC) co-processor. This report examines a variety of novel techniques including communication-aggregation, threaded wavefront-based DRAM communication-avoiding,more » dynamic threading decisions, SIMDization, and fusion of operators. We quantify performance through each phase of the V-cycle for both single-node and distributed-memory experiments and provide detailed analysis for each class of optimization. Results show our optimizations yield significant speedups across a variety of subdomain sizes while simultaneously demonstrating the potential of multi- and manycore processors to dramatically accelerate single-node performance. However, our analysis also indicates that improvements in networks and communication will be essential to reap the potential of manycore processors in large-scale multigrid calculations.« less

  12. On-line monitoring of multi-component strain development in a tufting needle using optical fibre Bragg grating sensors

    NASA Astrophysics Data System (ADS)

    Chehura, Edmon; Dell'Anno, Giuseppe; Huet, Tristan; Staines, Stephen; James, Stephen W.; Partridge, Ivana K.; Tatam, Ralph P.

    2014-07-01

    Dynamic loadings induced on a tufting needle during the tufting of dry carbon fibre preform via a commercial robot-controlled tufting head were investigated in situ and in real-time using optical fibre Bragg grating (FBG) sensors bonded to the needle shaft. The sensors were configured such that the axial strain and bending moments experienced by the needle could be measured. A study of the influence of thread and thread type on the strain imparted to the needle revealed axial strain profiles which had equivalent trends but different magnitudes. The mean of the maximum axial compression strains measured during the tufting of a 4-ply quasi-isotropic carbon fibre dry preform were - 499 ± 79 μɛ, - 463 ± 51 μɛ and - 431 ± 59 μɛ for a needle without thread, with metal wire and with Kevlar® thread, respectively. The needle similarly exhibited bending moments of different magnitude when the different needle feeding configurations were used.

  13. When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences.

    PubMed

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.

  14. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  15. Message passing with queues and channels

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dozsa, Gabor J; Heidelberger, Philip; Kumar, Sameer

    In an embodiment, a reception thread receives a source node identifier, a type, and a data pointer from an application and, in response, creates a receive request. If the source node identifier specifies a source node, the reception thread adds the receive request to a fast-post queue. If a message received from a network does not match a receive request on a posted queue, a polling thread adds a receive request that represents the message to an unexpected queue. If the fast-post queue contains the receive request, the polling thread removes the receive request from the fast-post queue. If themore » receive request that was removed from the fast-post queue does not match the receive request on the unexpected queue, the polling thread adds the receive request that was removed from the fast-post queue to the posted queue. The reception thread and the polling thread execute asynchronously from each other.« less

  16. Consensus oriented fuzzified decision support for oil spill contingency management.

    PubMed

    Liu, Xin; Wirtz, Kai W

    2006-06-30

    Studies on multi-group multi-criteria decision-making problems for oil spill contingency management are in their infancy. This paper presents a second-order fuzzy comprehensive evaluation (FCE) model to resolve decision-making problems in the area of contingency management after environmental disasters such as oil spills. To assess the performance of different oil combat strategies, second-order FCE allows for the utilization of lexical information, the consideration of ecological and socio-economic criteria and the involvement of a variety of stakeholders. On the other hand, the new approach can be validated by using internal and external checks, which refer to sensitivity tests regarding its internal setups and comparisons with other methods, respectively. Through a case study, the Pallas oil spill in the German Bight in 1998, it is demonstrated that this approach can help decision makers who search for an optimal strategy in multi-thread contingency problems and has a wider application potential in the field of integrated coastal zone management.

  17. Jali - Unstructured Mesh Infrastructure for Multi-Physics Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garimella, Rao V; Berndt, Markus; Coon, Ethan

    2017-04-13

    Jali is a parallel unstructured mesh infrastructure library designed for use by multi-physics simulations. It supports 2D and 3D arbitrary polyhedral meshes distributed over hundreds to thousands of nodes. Jali can read write Exodus II meshes along with fields and sets on the mesh and support for other formats is partially implemented or is (https://github.com/MeshToolkit/MSTK), an open source general purpose unstructured mesh infrastructure library from Los Alamos National Laboratory. While it has been made to work with other mesh frameworks such as MOAB and STKmesh in the past, support for maintaining the interface to these frameworks has been suspended formore » now. Jali supports distributed as well as on-node parallelism. Support of on-node parallelism is through direct use of the the mesh in multi-threaded constructs or through the use of "tiles" which are submeshes or sub-partitions of a partition destined for a compute node.« less

  18. Simulation of DKIST solar adaptive optics system

    NASA Astrophysics Data System (ADS)

    Marino, Jose; Carlisle, Elizabeth; Schmidt, Dirk

    2016-07-01

    Solar adaptive optics (AO) simulations are a valuable tool to guide the design and optimization process of current and future solar AO and multi-conjugate AO (MCAO) systems. Solar AO and MCAO systems rely on extended object cross-correlating Shack-Hartmann wavefront sensors to measure the wavefront. Accurate solar AO simulations require computationally intensive operations, which have until recently presented a prohibitive computational cost. We present an update on the status of a solar AO and MCAO simulation tool being developed at the National Solar Observatory. The simulation tool is a multi-threaded application written in the C++ language that takes advantage of current large multi-core CPU computer systems and fast ethernet connections to provide accurate full simulation of solar AO and MCAO systems. It interfaces with KAOS, a state of the art solar AO control software developed by the Kiepenheuer-Institut fuer Sonnenphysik, that provides reliable AO control. We report on the latest results produced by the solar AO simulation tool.

  19. Situation exploration in a persistent surveillance system with multidimensional data

    NASA Astrophysics Data System (ADS)

    Habibi, Mohammad S.

    2013-03-01

    There is an emerging need for fusing hard and soft sensor data in an efficient surveillance system to provide accurate estimation of situation awareness. These mostly abstract, multi-dimensional and multi-sensor data pose a great challenge to the user in performing analysis of multi-threaded events efficiently and cohesively. To address this concern an interactive Visual Analytics (VA) application is developed for rapid assessment and evaluation of different hypotheses based on context-sensitive ontology spawn from taxonomies describing human/human and human/vehicle/object interactions. A methodology is described here for generating relevant ontology in a Persistent Surveillance System (PSS) and demonstrates how they can be utilized in the context of PSS to track and identify group activities pertaining to potential threats. The proposed VA system allows for visual analysis of raw data as well as metadata that have spatiotemporal representation and content-based implications. Additionally in this paper, a technique for rapid search of tagged information contingent to ranking and confidence is explained for analysis of multi-dimensional data. Lastly the issue of uncertainty associated with processing and interpretation of heterogeneous data is also addressed.

  20. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  1. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE PAGES

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut; ...

    2013-01-01

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  2. Executing application function calls in response to an interrupt

    DOEpatents

    Almasi, Gheorghe; Archer, Charles J.; Giampapa, Mark E.; Gooding, Thomas M.; Heidelberger, Philip; Parker, Jeffrey J.

    2010-05-11

    Executing application function calls in response to an interrupt including creating a thread; receiving an interrupt having an interrupt type; determining whether a value of a semaphore represents that interrupts are disabled; if the value of the semaphore represents that interrupts are not disabled: calling, by the thread, one or more preconfigured functions in dependence upon the interrupt type of the interrupt; yielding the thread; and if the value of the semaphore represents that interrupts are disabled: setting the value of the semaphore to represent to a kernel that interrupts are hard-disabled; and hard-disabling interrupts at the kernel.

  3. End-threaded intramedullary positive profile screw ended self-tapping pin (Admit pin) - A cost-effective novel implant for fixing canine long bone fractures.

    PubMed

    Chanana, Mitin; Kumar, Adarsh; Tyagi, Som Prakash; Singla, Amit Kumar; Sharma, Arvind; Farooq, Uiase Bin

    2018-02-01

    The current study was undertaken to evaluate the clinical efficacy of end-threaded intramedullary pinning for management of various long bone fractures in canines. This study was conducted in two phases, managing 25 client-owned dogs presented with different fractures. The technique of application of end-threaded intramedullary pinning in long bone fractures was initially standardized in 6 clinical patients presented with long bone fractures. In this phase, end-threaded pins of different profiles, i.e., positive and negative, were used as the internal fixation technique. On the basis of results obtained from standardization phase, 19 client-owned dogs clinically presented with different fractures were implanted with end-threaded intramedullary positive profile screw ended self-tapping pin in the clinical application phase. The patients, allocated randomly in two groups, when evaluated postoperatively revealed slight pin migration in Group-I (negative profile), which resulted in disruption of callus site causing delayed union in one case and large callus formation in other two cases whereas no pin migration was observed in Group-II (positive profile). Other observations in Group-I was reduced muscle girth and delayed healing time as compared to Group-II. In clinical application, phase 21 st and 42 nd day post-operative radiographic follow-up revealed no pin migration in any of the cases, and there was no bone shortening or fragment collapse in end-threaded intramedullary positive profile screw ended self-tapping pin. The end-threaded intramedullary positive profile screw ended self-tapping pin used for fixation of long bone fractures in canines can resist pin migration, pin breakage, and all loads acting on the bone, i.e., compression, tension, bending, rotation, and shearing to an extent with no post-operative complications.

  4. Evolution of the ATLAS Software Framework towards Concurrency

    NASA Astrophysics Data System (ADS)

    Jones, R. W. L.; Stewart, G. A.; Leggett, C.; Wynne, B. M.

    2015-05-01

    The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. Maximising performance per watt will be a key metric, so all of these cores must be used as efficiently as possible. In order to address the deficiencies of the current framework, ATLAS has embarked upon two projects: first, a practical demonstration of the use of multi-threading in our reconstruction software, using the GaudiHive framework; second, an exercise to gather requirements for an updated framework, going back to the first principles of how event processing occurs. In this paper we report on both these aspects of our work. For the hive based demonstrators, we discuss what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. These lessons were fed into our considerations of a new framework and we present preliminary conclusions on this work. In particular we identify areas where the framework can be simplified in order to aid the implementation of a concurrent event processing scheme. Finally, we discuss the practical difficulties involved in migrating a large established code base to a multi-threaded framework and how this can be achieved for LHC Run 3.

  5. PRIMA-X Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lorenz, Daniel; Wolf, Felix

    2016-02-17

    The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics,more » hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universität Darmstadt (TU Darmstadt) starting February 1st, 2015, the project ended at GRS on January 31st, 2015. This report reflects the work accomplished at GRS until then. The work of GRS is expected to be continued at TU Darmstadt. The first main accomplishment of GRS is the design of different thread-level aggregation techniques. We created a prototype capable of aggregating the thread-level information in performance profiles using these techniques. The next step will be the integration of the most promising techniques into the Score-P measurement system and their evaluation. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving the design of the system-tree representation in Score-P’s profile format. We developed a new representation and a distributed algorithm to create the scalable system tree representation. Finally, we developed a lightweight approach to MPI wait-state profiling. Former algorithms either needed piggy-backing, which can cause significant runtime overhead, or tracing, which comes with its own set of scaling challenges. Our approach works with local data only and, thus, is scalable and has very little overhead.« less

  6. Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

    NASA Astrophysics Data System (ADS)

    Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

    2014-12-01

    The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.

  7. On the Performance of an Algebraic MultigridSolver on Multicore Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, A H; Schulz, M; Yang, U M

    2010-04-29

    Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.

  8. How Does the Current Generation of Medical Students View the Radiology Match?: An Analysis of the AuntMinnie and Student Doctor Network Online Forums.

    PubMed

    Yi, Paul H; Novin, Sherwin; Vander Plas, Taylor L; Huh, Eric; Magid, Donna

    2018-06-01

    The AuntMinnie (AM) and the Student Doctor Network (SDN) online forums are popular resources for medical students applying for residency. The purpose of this study was to describe medical student radiology-related posts on AM and SDN to better understand the medical student perspective on the application and Match process. We reviewed all posts made on the AM and SDN online forums over 5 consecutive academic years from July 2012 to July 2017. Each thread was organized into one of six major categories. We quantified forum utilization over the past 5 years by the total number of and the most frequently posted and viewed thread topics. We reviewed 2683 total threads with 5,723,909 views. Total number of threads posted and viewed fell by 46% and 63%, respectively, from 2013-2014 to 2014-2015, after which they returned near baseline by 2016-2017, along with an increase in interventional radiology-related posts between 2012-2013 (13%) and 2016-2017 (32%) (P < .001). The most common application-related topics were preapplication and program ranking advice (20% of all threads and views). Many posts were related to postinterview communication with residency programs (2% of all threads and views). After a drop in 2013-2014, utilization of AM and SDN increased in 2016-2017, along with increased interest in interventional radiology. Addressing the student concerns identified in our study, especially in preparing residency applications, ranking programs, and navigating difficult situations, such as postinterview program communication, may improve the radiology application process for future medical students and their advisors. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  9. Fatigue and Fracture Branch: A compendium of recently completed and on-going research projects

    NASA Technical Reports Server (NTRS)

    Elber, W.

    1984-01-01

    This compendium of recently completed and ongoing research projects from the Fatigue and Fracture Branch at NASA Langley Research Center provides technical descriptions and key results of all such projects expected to lead to publication of significant findings. The common thread to all these studies is the application of fracture mechanics analyses to engineering problems in metals and composites, with particular emphasis on airframe structural materials. References to recent publications are included where appropriate.

  10. Parallel mutual information estimation for inferring gene regulatory networks on GPUs

    PubMed Central

    2011-01-01

    Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264

  11. Present Situation of the Anti-Fatigue Processing of High-Strength Steel Internal Thread Based on Cold Extrusion Technology: A Review

    NASA Astrophysics Data System (ADS)

    Miao, Hong; Jiang, Cheng; Liu, Sixing; Zhang, Shanwen; Zhang, Yanjun

    2017-03-01

    The adoption of cold-extrusion forming for internal thread net forming becomes an important component of anti-fatigue processing with the development of internal thread processing towards high performance, low cost and low energy consumption. It has vast application foreground in the field of aviation, spaceflight, high speed train and etc. The internal thread processing and anti-fatigue manufacture technology are summarized. In terms of the perspective of processing quality and fatigue serving life, the advantages and disadvantages of the processing methods from are compared. The internal thread cold-extrusion processing technology is investigated for the purpose of improving the anti-fatigue serving life of internal thread. The superiorities of the plastic deformation law and surface integrity of the metal layer in the course of cold extrusion for improving its stability and economy are summed up. The proposed research forecasts the development tendency of the internal thread anti-fatigue manufacturing technology.

  12. Memory and Energy Optimization Strategies for Multithreaded Operating System on the Resource-Constrained Wireless Sensor Node

    PubMed Central

    Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Xu, Jun; Yang, Jianfeng; Zhou, Haiying; Shi, Hongling; Zhou, Peng

    2015-01-01

    Memory and energy optimization strategies are essential for the resource-constrained wireless sensor network (WSN) nodes. In this article, a new memory-optimized and energy-optimized multithreaded WSN operating system (OS) LiveOS is designed and implemented. Memory cost of LiveOS is optimized by using the stack-shifting hybrid scheduling approach. Different from the traditional multithreaded OS in which thread stacks are allocated statically by the pre-reservation, thread stacks in LiveOS are allocated dynamically by using the stack-shifting technique. As a result, memory waste problems caused by the static pre-reservation can be avoided. In addition to the stack-shifting dynamic allocation approach, the hybrid scheduling mechanism which can decrease both the thread scheduling overhead and the thread stack number is also implemented in LiveOS. With these mechanisms, the stack memory cost of LiveOS can be reduced more than 50% if compared to that of a traditional multithreaded OS. Not is memory cost optimized, but also the energy cost is optimized in LiveOS, and this is achieved by using the multi-core “context aware” and multi-core “power-off/wakeup” energy conservation approaches. By using these approaches, energy cost of LiveOS can be reduced more than 30% when compared to the single-core WSN system. Memory and energy optimization strategies in LiveOS not only prolong the lifetime of WSN nodes, but also make the multithreaded OS feasible to run on the memory-constrained WSN nodes. PMID:25545264

  13. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  14. Thread mapping using system-level model for shared memory multicores

    NASA Astrophysics Data System (ADS)

    Mitra, Reshmi

    Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.

  15. From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    DOE PAGES

    Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...

    2013-01-01

    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less

  16. Application Of Laser Induced Breakdown Spectroscopy (LIBS) Technique In Investigation Of Historical Metal Threads

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abdel-Kareem, O.; Khedr, A.; Abdelhamid, M.

    Analysis of the composition of an object is a necessary step in the documentation of the properties of this object for estimating its condition. Also this is an important task for establishing an appropriate conservation treatment of an object or to follow up the result of the application of the suggested treatments. There has been an important evolution in the methods used for analysis of metal threads since the second half of the twentieth century. Today, the main considerations of selecting a method are based on the diagnostic power, representative sampling, reproducibility, destructive nature/invasiveness of analysis and accessibility to themore » appropriate instrument. This study aims at evaluating the usefulness of the use of Laser Induced Breakdown Spectroscopy (LIBS) Technique for analysis of historical metal threads. In this study various historical metal threads collected from different museums were investigated using (LIBS) technique. For evaluating usefulness of the suggested analytical protocol of this technique, the same investigated metal thread samples were investigated with Scanning Electron Microscope (SEM) with energy-dispersive x-ray analyzer (EDX) which is reported in conservation field as the best method, to determine the chemical composition, and corrosion of investigated metal threads. The results show that all investigated metal threads in the present study are too dirty, strongly damaged and corroded with different types of corrosion products. Laser Induced Breakdown Spectroscopy (LIBS) Technique is considered very useful technique that can be used safely for investigating historical metal threads. It is, in fact, very useful tool as a noninvasive method for analysis of historical metal threads. The first few laser shots are very useful for the investigation of the corrosion and dirt layer, while the following shots are very useful and effective for investigating the coating layer. Higher number of laser shots are very useful for the main composition of the metal thread. There is a necessity to carry out further research to investigate and determine the most appropriate and effective approaches and methods for conservation of these metal threads.« less

  17. Bending at the base of a dragged-out viscous thread

    NASA Astrophysics Data System (ADS)

    Blount, Maurice; Lister, John

    2007-11-01

    We consider steady flow of a slender viscous thread falling from a nozzle onto a moving horizontal belt. We analyse the asymptotic limit of a very slender thread, and show that it has a boundary-layer structure in which bending stresses only become important near the belt, where they support a vertical stress and allow the velocity and rolling conditions to be satisfied. The outer solution is analogous to a viscous catenary, with velocity fixed at the belt and at the nozzle. There are three asymptotic regimes, with distinct structures, corresponding to the cases that the belt speed is larger than, smaller than, or close to the velocity of a freely falling thread. The implications for the onset and amplitude of meanders in the `fluid-mechanical sewing machine' are explored.

  18. Development and study of a parallel algorithm of iteratively forming latent functionally-determined structures for classification and analysis of meteorological data

    NASA Astrophysics Data System (ADS)

    Sorokin, V. A.; Volkov, Yu V.; Sherstneva, A. I.; Botygin, I. A.

    2016-11-01

    This paper overviews a method of generating climate regions based on an analytic signal theory. When applied to atmospheric surface layer temperature data sets, the method allows forming climatic structures with the corresponding changes in the temperature to make conclusions on the uniformity of climate in an area and to trace the climate changes in time by analyzing the type group shifts. The algorithm is based on the fact that the frequency spectrum of the thermal oscillation process is narrow-banded and has only one mode for most weather stations. This allows using the analytic signal theory, causality conditions and introducing an oscillation phase. The annual component of the phase, being a linear function, was removed by the least squares method. The remaining phase fluctuations allow consistent studying of their coordinated behavior and timing, using the Pearson correlation coefficient for dependence evaluation. This study includes program experiments to evaluate the calculation efficiency in the phase grouping task. The paper also overviews some single-threaded and multi-threaded computing models. It is shown that the phase grouping algorithm for meteorological data can be parallelized and that a multi-threaded implementation leads to a 25-30% increase in the performance.

  19. Real-time inextensible surgical thread simulation.

    PubMed

    Xu, Lang; Liu, Qian

    2018-03-27

    This paper discusses a real-time simulation method of inextensible surgical thread based on the Cosserat rod theory using position-based dynamics (PBD). The method realizes stable twining and knotting of surgical thread while including inextensibility, bending, twisting and coupling effects. The Cosserat rod theory is used to model the nonlinear elastic behavior of surgical thread. The surgical thread model is solved with PBD to achieve a real-time, extremely stable simulation. Due to the one-dimensional linear structure of surgical thread, the direct solution of the distance constraint based on tridiagonal matrix algorithm is used to enhance stretching resistance in every constraint projection iteration. In addition, continuous collision detection and collision response guarantee a large time step and high performance. Furthermore, friction is integrated into the constraint projection process to stabilize the twining of multiple threads and complex contact situations. Through comparisons with existing methods, the surgical thread maintains constant length under large deformation after applying the direct distance constraint in our method. The twining and knotting of multiple threads correspond to stable solutions to contact and friction forces. A surgical suture scene is also modeled to demonstrate the practicality and simplicity of our method. Our method achieves stable and fast simulation of inextensible surgical thread. Benefiting from the unified particle framework, the rigid body, elastic rod, and soft body can be simultaneously simulated. The method is appropriate for applications in virtual surgery that require multiple dynamic bodies.

  20. Tear glucose detection combining microfluidic thread based device, amperometric biosensor and microflow injection analysis.

    PubMed

    Agustini, Deonir; Bergamini, Márcio F; Marcolino-Junior, Luiz Humberto

    2017-12-15

    The tear glucose analysis is an important alternative for the indirect, simple and less invasive monitoring of blood glucose levels. However, the high cost and complex manufacturing process of tear glucose analyzers combined with the need to exchange the sensor after each analysis in the disposable tests prevent widespread application of the tear in glucose monitoring. Here, we present the integration of a biosensor made by the electropolymerization of poly(toluidine blue O) (PTB) and glucose oxidase (GOx) with an electroanalytical microfluidic device of easy assembly based on cotton threads, low cost materials and measurements by microflow injection analysis (µFIA) through passive pumping for performing tear glucose analyses in a simple, rapid and inexpensive way. A high stability between the analyses (RSD = 2.54%) and among the different systems (RSD = 3.13%) was obtained for the determination of glucose, in addition to a wide linear range between 0.075 and 7.5mmolL -1 and a limit of detection of 22.2µmolL -1 . The proposed method was efficiently employed in the determination of tear glucose in non-diabetic volunteers, obtaining a close correlation with their blood glucose levels, simplifying and reducing the costs of the analyses, making the tear glucose monitoring more accessible for the population. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Experiments and Analyses of Data Transfers Over Wide-Area Dedicated Connections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata

    Dedicated wide-area network connections are increasingly employed in high-performance computing and big data scenarios. One might expect the performance and dynamics of data transfers over such connections to be easy to analyze due to the lack of competing traffic. However, non-linear transport dynamics and end-system complexities (e.g., multi-core hosts and distributed filesystems) can in fact make analysis surprisingly challenging. We present extensive measurements of memory-to-memory and disk-to-disk file transfers over 10 Gbps physical and emulated connections with 0–366 ms round trip times (RTTs). For memory-to-memory transfers, profiles of both TCP and UDT throughput as a function of RTT show concavemore » and convex regions; large buffer sizes and more parallel flows lead to wider concave regions, which are highly desirable. TCP and UDT both also display complex throughput dynamics, as indicated by their Poincare maps and Lyapunov exponents. For disk-to-disk transfers, we determine that high throughput can be achieved via a combination of parallel I/O threads, parallel network threads, and direct I/O mode. Our measurements also show that Lustre filesystems can be mounted over long-haul connections using LNet routers, although challenges remain in jointly optimizing file I/O and transport method parameters to achieve peak throughput.« less

  2. 75 FR 25839 - Foreign-Trade Zone 26 Atlanta, Georgia, Application for Subzone, Yates Bleachery Company (Textile...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-10

    ...), high thread count (180 threads per inch and higher) fabrics under FTZ procedures based on a tolling... process any other customer- owned fabric under FTZ procedures. Subzone status would allow for deferral of...

  3. History of river regulation of the Noce River (NE Italy) and related bio-morphodynamic responses

    NASA Astrophysics Data System (ADS)

    Serlet, Alyssa; Scorpio, Vittoria; Mastronunzio, Marco; Proto, Matteo; Zen, Simone; Zolezzi, Guido; Bertoldi, Walter; Comiti, Francesco; Prà, Elena Dai; Surian, Nicola; Gurnell, Angela

    2016-04-01

    The Noce River is a hydropower-regulated Alpine stream in Northern-East Italy and a major tributary of the Adige River, the second longest Italian river. The objective of the research is to investigate the response of the lower course of the Noce to two main stages of hydromorphological regulation; channelization/ diversion and, one century later, hydropower regulation. This research uses a historical reconstruction to link the geomorphic response with natural and human-induced factors by identifying morphological and vegetation features from historical maps and airborne photogrammetry and implementing a quantitative analysis of the river response to channelization and flow / sediment supply regulation related to hydropower development. A descriptive overview is presented. The concept of evolutionary trajectory is integrated with predictions from morphodynamic theories for river bars that allow increased insight to investigate the river response to a complex sequence of regulatory events such as development of bars, islands and riparian vegetation. Until the mid-19th century the river had a multi-thread channel pattern. Thereafter (1852) the river was straightened and diverted. Upstream of Mezzolombardo village the river was constrained between embankments of approximately 100 m width while downstream they are of approximately 50 m width. Since channelization some interesting geomorphic changes have appeared in the river e.g. the appearance of alternate bars in the channel. In 1926 there was a breach in the right bank of the downstream part that resulted in a multi-thread river reach which can be viewed as a recovery to the earlier multi-thread pattern. After the 1950's the flow and sediment supply became strongly regulated by hydropower development. The analysis of aerial images reveals that the multi-thread reach became progressively stabilized by vegetation development over the bars, though signs of some dynamics can still be recognizable today, despite the strong hydropeaking that dominates the flow regime. The results of the historical analysis will be used in a larger framework that focuses on interdisciplinary research of interactions between flow, sediment and vegetation in regulated rivers and aims to enhance knowledge on the interplay between river bars and vegetation in the perspective of providing enhanced tools for river rehabilitation and restoration.

  4. Mechanism of supporting sub-communicator collectives with O(64) counters as opposed to one counter for each sub-communicator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, Sameer; Mamidala, Amith R.; Ratterman, Joseph D.

    A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal tomore » the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.« less

  5. Mechanism of supporting sub-communicator collectives with o(64) counters as opposed to one counter for each sub-communicator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blocksome, Michael; Kumar, Sameer; Mamidala, Amith R.

    A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a barrier algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal tomore » the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.« less

  6. Mechanism of supporting sub-communicator collectives with O(64) counters as opposed to one counter for each sub-communicator

    DOEpatents

    Kumar, Sameer; Mamidala, Amith R.; Ratterman, Joseph D.; Blocksome, Michael; Miller, Douglas

    2013-09-03

    A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.

  7. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less

  8. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE PAGES

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...

    2017-10-24

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  9. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  10. Multicore Architecture-aware Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Srinivasa, Avinash

    Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a largemore » scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times.« less

  11. Specifications and implementation of the RT MHD control system for the EC launcher of FTU

    NASA Astrophysics Data System (ADS)

    Galperti, C.; Alessi, E.; Boncagni, L.; Bruschi, A.; Granucci, G.; Grosso, A.; Iannone, F.; Marchetto, C.; Nowak, S.; Panella, M.; Sozzi, C.; Tilia, B.

    2012-09-01

    To perform real time plasma control experiments using EC heating waves by using the new fast launcher installed on FTU a dedicated data acquisition and elaboration system has been designed recently. A prototypical version of the acquisition/control system has been recently developed and will be tested on FTU machine in its next experimental campaign. The open-source framework MARTe (Multi-threaded Application Real-Time executor) on Linux/RTAI real-time operating system has been chosen as software platform to realize the control system. Standard open-architecture industrial PCs, based either on VME bus and CompactPCI bus equipped with standard input/output cards are the chosen hardware platform.

  12. MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction.

    PubMed

    Li, Nan; Ainsworth, Richard I; Wu, Meixin; Ding, Bo; Wang, Wei

    2016-03-15

    MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. The program is available at http://wanglab.ucsd.edu/MIEC-SVM CONTACT: : wei-wang@ucsd.edu Supplementary data available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. a Cache Design Method for Spatial Information Visualization in 3d Real-Time Rendering Engine

    NASA Astrophysics Data System (ADS)

    Dai, X.; Xiong, H.; Zheng, X.

    2012-07-01

    A well-designed cache system has positive impacts on the 3D real-time rendering engine. As the amount of visualization data getting larger, the effects become more obvious. They are the base of the 3D real-time rendering engine to smoothly browsing through the data, which is out of the core memory, or from the internet. In this article, a new kind of caches which are based on multi threads and large file are introduced. The memory cache consists of three parts, the rendering cache, the pre-rendering cache and the elimination cache. The rendering cache stores the data that is rendering in the engine; the data that is dispatched according to the position of the view point in the horizontal and vertical directions is stored in the pre-rendering cache; the data that is eliminated from the previous cache is stored in the eliminate cache and is going to write to the disk cache. Multi large files are used in the disk cache. When a disk cache file size reaches the limit length(128M is the top in the experiment), no item will be eliminated from the file, but a new large cache file will be created. If the large file number is greater than the maximum number that is pre-set, the earliest file will be deleted from the disk. In this way, only one file is opened for writing and reading, and the rest are read-only so the disk cache can be used in a high asynchronous way. The size of the large file is limited in order to map to the core memory to save loading time. Multi-thread is used to update the cache data. The threads are used to load data to the rendering cache as soon as possible for rendering, to load data to the pre-rendering cache for rendering next few frames, and to load data to the elimination cache which is not necessary for the moment. In our experiment, two threads are designed. The first thread is to organize the memory cache according to the view point, and created two threads: the adding list and the deleting list, the adding list index the data that should be loaded to the pre-rendering cache immediately, the deleting list index the data that is no longer visible in the rendering scene and should be moved to the eliminate cache; the other thread is to move the data in the memory and disk cache according to the adding and the deleting list, and create the download requests when the data is indexed in the adding but cannot be found either in memory cache or disk cache, eliminate cache data is moved to the disk cache when the adding list and deleting are empty. The cache designed as described above in our experiment shows reliable and efficient, and the data loading time and files I/O time decreased sharply, especially when the rendering data getting larger.

  14. Crafting threads of diblock copolymer micelles via flow-enabled self-assembly.

    PubMed

    Li, Bo; Han, Wei; Jiang, Beibei; Lin, Zhiqun

    2014-03-25

    Hierarchically assembled amphiphilic diblock copolymer micelles were exquisitely crafted over large areas by capitalizing on two concurrent self-assembling processes at different length scales, namely, the periodic threads composed of a monolayer or a bilayer of diblock copolymer micelles precisely positioned by flow-enabled self-assembly (FESA) on the microscopic scale and the self-assembly of amphiphilic diblock copolymer micelles into ordered arrays within an individual thread on the nanometer scale. A minimum spacing between two adjacent threads λmin was observed. A model was proposed to rationalize the relationship between the thread width and λmin. Such FESA of diblock copolymer micelles is remarkably controllable and easy to implement. It opens up possibilities for lithography-free positioning and patterning of diblock copolymer micelles for various applications in template fabrication of periodic inorganic nanostructures, nanoelectronics, optoelectronics, magnetic devices, and biotechnology.

  15. Influence of Micro Threads Alteration on Osseointegration and Primary Stability of Implants: An FEA and In Vivo Analysis in Rabbits.

    PubMed

    Chowdhary, Ramesh; Halldin, Anders; Jimbo, Ryo; Wennerberg, Ann

    2015-06-01

    To describe the early bone tissue response to implants with and without micro threads designed to the full length of an oxidized titanium implant. A pair of two-dimensional finite element models was designed using a computer aided three-dimensional interactive application files of an implant model with micro threads in between macro threads and one without micro threads. Oxidized titanium implants with (test implants n=20) and without (control implants n=20) micro thread were prepared. A total of 12 rabbits were used and each received four implants. Insertion torque while implant placement and removal torque analysis after 4 weeks was performed in nine rabbits, and histomorphometric analysis in three rabbits, respectively. Finite element analysis showed less stress accumulation in test implant models with 31Mpa when compared with 62.2 Mpa in control implant model. Insertion and removal torque analysis did not show any statistical significance between the two implant designs. At 4 weeks, there was a significant difference between the two groups in the percentage of new bone volume and bone-to-implant contact in the femur (p< .05); however, not in the tibia. The effect of micro threads was prominent in the femur suggesting that micro threads promote bone formation. The stress distribution supported by the micro threads was especially effective in the cancellous bone. © 2013 Wiley Periodicals, Inc.

  16. Framework for Development of Object-Oriented Software

    NASA Technical Reports Server (NTRS)

    Perez-Poveda, Gus; Ciavarella, Tony; Nieten, Dan

    2004-01-01

    The Real-Time Control (RTC) Application Framework is a high-level software framework written in C++ that supports the rapid design and implementation of object-oriented application programs. This framework provides built-in functionality that solves common software development problems within distributed client-server, multi-threaded, and embedded programming environments. When using the RTC Framework to develop software for a specific domain, designers and implementers can focus entirely on the details of the domain-specific software rather than on creating custom solutions, utilities, and frameworks for the complexities of the programming environment. The RTC Framework was originally developed as part of a Space Shuttle Launch Processing System (LPS) replacement project called Checkout and Launch Control System (CLCS). As a result of the framework s development, CLCS software development time was reduced by 66 percent. The framework is generic enough for developing applications outside of the launch-processing system domain. Other applicable high-level domains include command and control systems and simulation/ training systems.

  17. High efficiency carbon nanotube thread antennas

    NASA Astrophysics Data System (ADS)

    Amram Bengio, E.; Senic, Damir; Taylor, Lauren W.; Tsentalovich, Dmitri E.; Chen, Peiyu; Holloway, Christopher L.; Babakhani, Aydin; Long, Christian J.; Novotny, David R.; Booth, James C.; Orloff, Nathan D.; Pasquali, Matteo

    2017-10-01

    Although previous research has explored the underlying theory of high-frequency behavior of carbon nanotubes (CNTs) and CNT bundles for antennas, there is a gap in the literature for direct experimental measurements of radiation efficiency. These measurements are crucial for any practical application of CNT materials in wireless communication. In this letter, we report a measurement technique to accurately characterize the radiation efficiency of λ/4 monopole antennas made from the CNT thread. We measure the highest absolute values of radiation efficiency for CNT antennas of any type, matching that of copper wire. To capture the weight savings, we propose a specific radiation efficiency metric and show that these CNT antennas exceed copper's performance by over an order of magnitude at 1 GHz and 2.4 GHz. We also report direct experimental observation that, contrary to metals, the radiation efficiency of the CNT thread improves significantly at higher frequencies. These results pave the way for practical applications of CNT thread antennas, particularly in the aerospace and wearable electronics industries where weight saving is a priority.

  18. MACHINING ELIMINATION THROUGH APPLICATION OF THREAD FORMING FASTENERS IN NET SHAPED CAST HOLES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cleaver, Ryan J; Cleaver, Todd H; Talbott, Richard

    The ultimate objective of this work was to eliminate approximately 30% of the machining performed in typical automotive engine and transmission plants by using thread forming fasteners in as-cast holes of aluminum and magnesium cast components. The primary issues at the source of engineers reluctance to implementing thread forming fasteners in lightweight castings are: * Little proof of consistency of clamp load vs. input torque in either aluminum or magnesium castings. * No known data to understand the effect on consistency of clamp load as casting dies wear. The clamp load consistency concern is founded in the fact that amore » portion of the input torque used to create clamp load is also used to create threads. The torque used for thread forming may not be consistent due to variations in casting material, hole size and shape due to tooling wear and process variation (thermal and mechanical). There is little data available to understand the magnitude of this concern or to form the basis of potential solutions if the range of clamp load variation is very high (> +/- 30%). The range of variation that can be expected in as-cast hole size and shape over the full life cycle of a high pressure die casting die was established in previous work completed by Pacific Northwest National Laboratory, (PNNL). This established range of variation was captured in a set of 12 cast bosses by designing core pins at the size and draft angles identified in the sited previous work. The cast bosses were cut into nuts that could be used in the Ford Fastener Laboratory test-cell to measure clamp load when a thread forming fastener was driven into a cast nut. There were two sets of experiments run. First, a series of cast aluminum nuts were made reflecting the range of shape and size variations to be expected over the life cycle of a die casting die. Taptite thread forming fasteners, (a widely used thread forming fastener suitable for aluminum applications), were driven into the various cored, as-cast nuts at a constant input torque and resulting clamp loads were recorded continuously. The clamp load data was used to determine the range of clamp loads to be expected. The bolts were driven to failure. The clamp load corresponding to the target input of 18.5 Nm was recorded for each fastener. In a like fashion, a second set of experiments were run with cast magnesium nuts and ALtracs thread forming fasteners, (a widely used thread forming fastener suitable for magnesium applications). Again all clamp loads were recorded and analyzed similarly to the Taptites in aluminum cast nuts. Results from previous work performed on the same test cell for a Battelle project using standard M8 bolts into standard M8 nuts were included as a comparator for a standard bolt and nut application. The results for the thread forming fasteners in aluminum cast holes were well within industry expectations of +/- 30% for out of the box and robustness range testing. The results for the dry and lubed extreme conditions were only slightly higher than industry expectations at +/- 35.6%. However, when compared to the actual Battelle results (+/- 40%) for a standard bolt and nut the tread forming fasteners performed slightly better. The results for the thread forming fasteners in magnesium cast holes were all well within industry expectations of +/- 30% for all three conditions. The robustness range (.05mm larger and smaller holes than the expected wear pattern of a die casting die at full life cycle) results also fell within the industry expectations for standard threaded fasteners. These results were very encouraging. It was concluded that this work showed that clamp load variation with thread forming fasteners is consistent with industry expectations for standard steel bolts and nuts at +/- 30%. There does not appear to be any significant increase in clamp load variation due to the application of thread forming fasteners in as-cast holes of aluminum or magnesium over the effective life of a die casting mold. The fully implemented potential benefit of thread forming fasteners in as-cast holes of aluminum and magnesium is estimated to be 6 trillion Btu per year for North America. Economic benefit is estimated to be nearly $800 million per year. Environmental benefits and quality improvements will also result from full implementation of this technology.« less

  19. Plasma treatments of wool fiber surface for microfluidic applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jeon, So-Hyoun; Hwang, Ki-Hwan; Lee, Jin Su

    Highlights: • We used atmospheric plasma for tuning the wettability of wool fibers. • The wicking rates of the wool fibers increased with increasing treatment time. • The increasing of wettability results in removement of fatty acid on the wool surface. - Abstract: Recent progress in health diagnostics has led to the development of simple and inexpensive systems. Thread-based microfluidic devices allow for portable and inexpensive field-based technologies enabling medical diagnostics, environmental monitoring, and food safety analysis. However, controlling the flow rate of wool thread, which is a very important part of thread-based microfluidic devices, is quite difficult. For thismore » reason, we focused on thread-based microfluidics in the study. We developed a method of changing the wettability of hydrophobic thread, including wool thread. Thus, using natural wool thread as a channel, we demonstrate herein that the manipulation of the liquid flow, such as micro selecting and micro mixing, can be achieved by applying plasma treatment to wool thread. In addition to enabling the flow control of the treated wool channels consisting of all natural substances, this procedure will also be beneficial for biological sensing devices. We found that wools treated with various gases have different flow rates. We used an atmospheric plasma with O{sub 2}, N{sub 2} and Ar gases.« less

  20. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  1. Wind Tunnel Analysis of the Airflow through Insect-Proof Screens and Comparison of Their Effect When Installed in a Mediterranean Greenhouse

    PubMed Central

    López, Alejandro; Molina-Aiz, Francisco D.; Valera, Diego L.; Peña, Araceli

    2016-01-01

    The present work studies the effect of three insect-proof screens with different geometrical and aerodynamic characteristics on the air velocity and temperature inside a Mediterranean multi-span greenhouse with three roof vents and without crops, divided into two independent sectors. First, the insect-proof screens were characterised geometrically by analysing digital images and testing in a low velocity wind tunnel. The wind tunnel tests gave screen discharge coefficient values of Cd,φ of 0.207 for screen 1 (10 × 20 threads·cm−2; porosity φ = 35.0%), 0.151 for screen 2 (13 × 30 threads·cm−2; φ = 26.3%) and 0.325 for screen 3 (10 × 20 threads·cm−2; porosity φ = 36.0%), at an air velocity of 0.25 m·s−1. Secondly, when screens were installed in the greenhouse, we observed a statistical proportionality between the discharge coefficient at the openings and the air velocity ui measured in the centre of the greenhouse, ui = 0.856 Cd + 0.062 (R2 = 0.68 and p-value = 0.012). The inside-outside temperature difference ΔTio diminishes when the inside velocity increases following the statistically significant relationship ΔTio = (−135.85 + 57.88/ui)0.5 (R2 = 0.85 and p-value = 0.0011). Different thread diameters and tension affects the screen thickness, and means that similar porosities may well be associated with very different aerodynamic characteristics. Screens must be characterised by a theoretical function Cd,φ = [(2eμ/Kpρ)·(1/us) + (2eY/Kp0.5)]−0.5 that relates the discharge coefficient of the screen Cd,φ with the air velocity us. This relationship depends on the three parameters that define the aerodynamic behaviour of porous medium: permeability Kp, inertial factor Y and screen thickness e (and on air temperature that determine its density ρ and viscosity μ). However, for a determined temperature of air, the pressure drop-velocity relationship can be characterised only with two parameters: ΔP = aus2 + bus. PMID:27187401

  2. Wind Tunnel Analysis of the Airflow through Insect-Proof Screens and Comparison of Their Effect When Installed in a Mediterranean Greenhouse.

    PubMed

    López, Alejandro; Molina-Aiz, Francisco D; Valera, Diego L; Peña, Araceli

    2016-05-12

    The present work studies the effect of three insect-proof screens with different geometrical and aerodynamic characteristics on the air velocity and temperature inside a Mediterranean multi-span greenhouse with three roof vents and without crops, divided into two independent sectors. First, the insect-proof screens were characterised geometrically by analysing digital images and testing in a low velocity wind tunnel. The wind tunnel tests gave screen discharge coefficient values of Cd,φ of 0.207 for screen 1 (10 × 20 threads·cm(-2); porosity φ = 35.0%), 0.151 for screen 2 (13 × 30 threads·cm(-2); φ = 26.3%) and 0.325 for screen 3 (10 × 20 threads·cm(-2); porosity φ = 36.0%), at an air velocity of 0.25 m·s(-1). Secondly, when screens were installed in the greenhouse, we observed a statistical proportionality between the discharge coefficient at the openings and the air velocity ui measured in the centre of the greenhouse, ui = 0.856 Cd + 0.062 (R² = 0.68 and p-value = 0.012). The inside-outside temperature difference ΔTio diminishes when the inside velocity increases following the statistically significant relationship ΔTio = (-135.85 + 57.88/ui)(0.5) (R² = 0.85 and p-value = 0.0011). Different thread diameters and tension affects the screen thickness, and means that similar porosities may well be associated with very different aerodynamic characteristics. Screens must be characterised by a theoretical function Cd,φ = [(2eμ/Kpρ)·(1/us) + (2eY/Kp(0.5))](-0.5) that relates the discharge coefficient of the screen Cd,φ with the air velocity us. This relationship depends on the three parameters that define the aerodynamic behaviour of porous medium: permeability Kp, inertial factor Y and screen thickness e (and on air temperature that determine its density ρ and viscosity μ). However, for a determined temperature of air, the pressure drop-velocity relationship can be characterised only with two parameters: ΔP = aus² + bus.

  3. THREAD: A programming environment for interactive planning-level robotics applications

    NASA Technical Reports Server (NTRS)

    Beahan, John J., Jr.

    1989-01-01

    THREAD programming language, which was developed to meet the needs of researchers in developing robotics applications that perform such tasks as grasp, trajectory design, sensor data analysis, and interfacing with external subsystems in order to perform servo-level control of manipulators and real time sensing is discussed. The philosophy behind THREAD, the issues which entered into its design, and the features of the language are discussed from the viewpoint of researchers who want to develop algorithms in a simulation environment, and from those who want to implement physical robotics systems. The detailed functions of the many complex robotics algorithms and tools which are part of the language are not explained, but an overall impression of their capability is given.

  4. Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network.

    PubMed

    Bui, Ngot; Yen, John; Honavar, Vasant

    2016-06-01

    Online health communities constitute a useful source of information and social support for patients. American Cancer Society's Cancer Survivor Network (CSN), a 173,000-member community, is the largest online network for cancer patients, survivors, and caregivers. A discussion thread in CSN is often initiated by a cancer survivor seeking support from other members of CSN. Discussion threads are multi-party conversations that often provide a source of social support e.g., by bringing about a change of sentiment from negative to positive on the part of the thread originator. While previous studies regarding cancer survivors have shown that members of an online health community derive benefits from their participation in such communities, causal accounts of the factors that contribute to the observed benefits have been lacking. We introduce a novel framework to examine the temporal causality of sentiment dynamics in the CSN. We construct a Probabilistic Computation Tree Logic representation and a corresponding probabilistic Kripke structure to represent and reason about the changes in sentiments of posts in a thread over time. We use a sentiment classifier trained using machine learning on a set of posts manually tagged with sentiment labels to classify posts as expressing either positive or negative sentiment. We analyze the probabilistic Kripke structure to identify the prima facie causes of sentiment change on the part of the thread originators in the CSN forum and their significance. We find that the sentiment of replies appears to causally influence the sentiment of the thread originator. Our experiments also show that the conclusions are robust with respect to the choice of the (i) classification threshold of the sentiment classifier; (ii) and the choice of the specific sentiment classifier used. We also extend the basic framework for temporal causality analysis to incorporate the uncertainty in the states of the probabilistic Kripke structure resulting from the use of an imperfect state transducer (in our case, the sentiment classifier). Our analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers and moderators of an online community such as CSN can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants. The proposed methodology for analysis of temporal causality has broad applicability in a variety of settings where the dynamics of the underlying system can be modeled in terms of state variables that change in response to internal or external inputs.

  5. Temporal Causality Analysis of Sentiment Change in a Cancer Survivor Network

    PubMed Central

    Bui, Ngot; Yen, John; Honavar, Vasant

    2017-01-01

    Online health communities constitute a useful source of information and social support for patients. American Cancer Society’s Cancer Survivor Network (CSN), a 173,000-member community, is the largest online network for cancer patients, survivors, and caregivers. A discussion thread in CSN is often initiated by a cancer survivor seeking support from other members of CSN. Discussion threads are multi-party conversations that often provide a source of social support e.g., by bringing about a change of sentiment from negative to positive on the part of the thread originator. While previous studies regarding cancer survivors have shown that members of an online health community derive benefits from their participation in such communities, causal accounts of the factors that contribute to the observed benefits have been lacking. We introduce a novel framework to examine the temporal causality of sentiment dynamics in the CSN. We construct a Probabilistic Computation Tree Logic representation and a corresponding probabilistic Kripke structure to represent and reason about the changes in sentiments of posts in a thread over time. We use a sentiment classifier trained using machine learning on a set of posts manually tagged with sentiment labels to classify posts as expressing either positive or negative sentiment. We analyze the probabilistic Kripke structure to identify the prima facie causes of sentiment change on the part of the thread originators in the CSN forum and their significance. We find that the sentiment of replies appears to causally influence the sentiment of the thread originator. Our experiments also show that the conclusions are robust with respect to the choice of the (i) classification threshold of the sentiment classifier; (ii) and the choice of the specific sentiment classifier used. We also extend the basic framework for temporal causality analysis to incorporate the uncertainty in the states of the probabilistic Kripke structure resulting from the use of an imperfect state transducer (in our case, the sentiment classifier). Our analysis of temporal causality of CSN sentiment dynamics offers new insights that the designers, managers and moderators of an online community such as CSN can utilize to facilitate and enhance the interactions so as to better meet the social support needs of the CSN participants. The proposed methodology for analysis of temporal causality has broad applicability in a variety of settings where the dynamics of the underlying system can be modeled in terms of state variables that change in response to internal or external inputs. PMID:29399599

  6. Experimental study on the effect of shape of bolt and nut on fatigue strength for bolted joint

    NASA Astrophysics Data System (ADS)

    Matsunari, T.; Oda, K.; Tsutsumi, N.; Yakushiji, T.; Noda, N. A.; Sano, Y.

    2018-06-01

    In this study, the effect of curvature radius of the thread bottom and the pitch difference between of M16 bolt and nut on fatigue strength for bolted joint is considered experimentally. The M16 bolt-nut specimens having the two kinds of thread bottom radii and the pitch differences are prepared. The S-N curves for bolted specimens with different thread shapes are obtained by the stress-controlled fatigue test (stress ratio R>0). The experimental results are compared and discussed in terms of stress analysis. The finite element method is used to make a simulation of the fatigue experiment and the mean stress and stress amplitude at each thread bottom of bolt are analysed. It is found that the initiation and propagation of crack are changed by introducing the pitch difference of α=15 μm from the crack observation in cross section of the bolt specimens after the experiment. Furthermore, the fatigue life can be extended by increasing curvature radius of thread bottom and introducing the pitch difference.

  7. Concurrent Breakpoints

    DTIC Science & Technology

    2011-12-18

    Proceedings of the SIGMET- RICS Symposium on Parallel and Distributed Tools, pages 48–59, 1998. [8] A. Dinning and E. Schonberg . Detecting access...multi- threaded programs. ACM Trans. Comput. Syst., 15(4):391– 411, 1997. [38] E. Schonberg . On-the-fly detection of access anomalies. In Proceedings

  8. Lubricating Holes for Corroded Nuts and Bolts

    NASA Technical Reports Server (NTRS)

    Penn, B. G.; Clemons, J. M.; Ledbetter, Frank E., III

    1986-01-01

    Corroded fasteners taken apart more easily. Lubricating holes bored to thread from three of flats. Holes facilitate application of penetrating oil to help loosen nut when rusted onto bolt. Holes make it possible to apply lubricants and rust removers directly to more of thread than otherwise reachable.

  9. Low cost microfluidic device based on cotton threads for electroanalytical application.

    PubMed

    Agustini, Deonir; Bergamini, Márcio F; Marcolino-Junior, Luiz Humberto

    2016-01-21

    Microfluidic devices are an interesting alternative for performing analytical assays, due to the speed of analyses, reduced sample, reagent and solvent consumption and less waste generation. However, the high manufacturing costs still prevent the massive use of these devices worldwide. Here, we present the construction of a low cost microfluidic thread-based electroanalytical device (μTED), employing extremely cheap materials and a manufacturing process free of equipment. The microfluidic channels were built with cotton threads and the estimated cost per device was only $0.39. The flow of solutions (1.12 μL s(-1)) is generated spontaneously due to the capillary forces, eliminating the use of any pumping system. To demonstrate the analytical performance of the μTED, a simultaneous determination of acetaminophen (ACT) and diclofenac (DCF) was performed by multiple pulse amperometry (MPA). A linear dynamic range (LDR) of 10 to 320 μmol L(-1) for both species, a limit of detection (LOD) and a limit of quantitation (LOQ) of 1.4 and 4.7 μmol L(-1) and 2.5 and 8.3 μmol L(-1) for ACT and DCF, respectively, as well as an analytical frequency of 45 injections per hour were reached. Thus, the proposed device has shown potential to extend the use of microfluidic analytical devices, due to its simplicity, low cost and good analytical performance.

  10. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    NASA Astrophysics Data System (ADS)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  11. Attenuation of the tip vortex flow using a flexible thread

    NASA Astrophysics Data System (ADS)

    Lee, Seung-Jae; Shin, Jin-Woo; Arndt, Roger E. A.; Suh, Jung-Chun

    2018-01-01

    Tip vortex cavitation (TVC) is important in a number of practical engineering applications. The onset of TVC is a critical concern for navy surface ships and submarines that aim to increase their capability to evade detection. A flexible thread attachment at blade tips was recently suggested as a new method to delay the onset of TVC. Although the occurrence of TVC can be reduced using a flexible thread, no scientific investigation focusing on its mechanisms has been undertaken. Thus, herein, we experimentally investigated the use of the flexible thread to suppress TVC from an elliptical wing. These investigations were performed in a cavitation tunnel and involved an observation of TVC using high-speed cameras, motion tracking of the thread using image-processing techniques, and near-field flow measurements performed using stereoscopic particle image velocimetry. The experimental data suggested that the flexible thread affects the axial velocity field more than the circumferential velocity field around the TVC axis. Furthermore, we observed no clear dependence of the vortex core size, circulation, and flow unsteadiness on TVC suppression. However, the presence of the thread at the wing tip led to a notable reduction in the streamwise velocity field, thereby alleviating TVC.

  12. Development of a Next Generation Concurrent Framework for the ATLAS Experiment

    NASA Astrophysics Data System (ADS)

    Calafiura, P.; Lampl, W.; Leggett, C.; Malon, D.; Stewart, G.; Wynne, B.

    2015-12-01

    The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. With current memory consumption for 64 bit ATLAS reconstruction in a high luminosity environment approaching 4GB, it will become impossible to fully occupy all cores in a machine without exhausting available memory. However, since maximizing performance per watt will be a key metric, a mechanism must be found to use all cores as efficiently as possible. In this paper we report on our progress with a practical demonstration of the use of multithreading in the ATLAS reconstruction software, using the GaudiHive framework. We have expanded support to Calorimeter, Inner Detector, and Tracking code, discussing what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on both the performance gains, and what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. We also present our findings on implementing a hybrid multi-threaded / multi-process framework, to take advantage of the strengths of each type of concurrency, while avoiding some of their corresponding limitations.

  13. Fabrication of drug-loaded electrospun aligned fibrous threads for suture applications.

    PubMed

    He, Chuang-Long; Huang, Zheng-Ming; Han, Xiao-Jian

    2009-04-01

    In this work, drug-loaded fibers and threads were successfully fabricated by combining electrospinning with aligned fibers collection. Two different electrospinning processes, that is, blend and coaxial electrospinning, to incorporate a model drug tetracycline hydrochloride (TCH) into poly(L-lactic acid) (PLLA) fibers have been used and compared with each other. The resulting composite ultrafine fibers and threads were characterized through scanning electron microscopy, transmission electron microscopy, Fourier transform infrared spectroscopy, X-ray diffraction, differential scanning calorimetry, and tensile testing. It has been shown that average diameters of the fibers made from the same polymer concentration depended on the processing method. The blend TCH/PLLA fibers showed the smallest fiber diameter, whereas neat PLLA fibers and core-shell TCH-PLLA fibers showed a larger proximal average diameter. Higher rotating speed of a wheel collector is helpful for obtaining better-aligned fibers. Both the polymer and the drug in the electrospun fibers have poor crystalline property. In vitro release study indicated that threads made from the core-shell fibers could suppress the initial burst release and provide a sustained drug release useful for the release of growth factor or other therapeutic drugs. On the other hand, the threads from the blend fibers produced a large initial burst release that may be used to prevent bacteria infection. A combination of these results suggests that electrospinning technique provides a novel way to fabricate medical agents-loaded fibrous threads for tissue suturing and tissue regeneration applications. Copyright 2008 Wiley Periodicals, Inc.

  14. Co Modeling and Co Synthesis of Safety Critical Multi threaded Embedded Software for Multi Core Embedded Platforms

    DTIC Science & Technology

    2017-03-20

    computation, Prime Implicates, Boolean Abstraction, real- time embedded software, software synthesis, correct by construction software design , model...types for time -dependent data-flow networks". J.-P. Talpin, P. Jouvelot, S. Shukla. ACM-IEEE Conference on Methods and Models for System Design ...information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing   data sources, gathering and

  15. Evaluation of Residence Time on Nitrogen Oxides Removal in Non-Thermal Plasma Reactor

    PubMed Central

    Talebizadeh, Pouyan; Rahimzadeh, Hassan; Babaie, Meisam; Javadi Anaghizi, Saeed; Ghomi, Hamidreza; Ahmadi, Goodarz; Brown, Richard

    2015-01-01

    Non-thermal plasma (NTP) has been introduced over the last few years as a promising after- treatment system for nitrogen oxides and particulate matter removal from diesel exhaust. NTP technology has not been commercialised as yet, due to its high rate of energy consumption. Therefore, it is important to seek out new methods to improve NTP performance. Residence time is a crucial parameter in engine exhaust emissions treatment. In this paper, different electrode shapes are analysed and the corresponding residence time and NOx removal efficiency are studied. An axisymmetric laminar model is used for obtaining residence time distribution numerically using FLUENT software. If the mean residence time in a NTP plasma reactor increases, there will be a corresponding increase in the reaction time and consequently the pollutant removal efficiency increases. Three different screw thread electrodes and a rod electrode are examined. The results show the advantage of screw thread electrodes in comparison with the rod electrode. Furthermore, between the screw thread electrodes, the electrode with the thread width of 1 mm has the highest NOx removal due to higher residence time and a greater number of micro-discharges. The results show that the residence time of the screw thread electrode with a thread width of 1 mm is 21% more than for the rod electrode. PMID:26496630

  16. Data Acquisition System for Multi-Frequency Radar Flight Operations Preparation

    NASA Technical Reports Server (NTRS)

    Leachman, Jonathan

    2010-01-01

    A three-channel data acquisition system was developed for the NASA Multi-Frequency Radar (MFR) system. The system is based on a commercial-off-the-shelf (COTS) industrial PC (personal computer) and two dual-channel 14-bit digital receiver cards. The decimated complex envelope representations of the three radar signals are passed to the host PC via the PCI bus, and then processed in parallel by multiple cores of the PC CPU (central processing unit). The innovation is this parallelization of the radar data processing using multiple cores of a standard COTS multi-core CPU. The data processing portion of the data acquisition software was built using autonomous program modules or threads, which can run simultaneously on different cores. A master program module calculates the optimal number of processing threads, launches them, and continually supplies each with data. The benefit of this new parallel software architecture is that COTS PCs can be used to implement increasingly complex processing algorithms on an increasing number of radar range gates and data rates. As new PCs become available with higher numbers of CPU cores, the software will automatically utilize the additional computational capacity.

  17. Implementation of GPU accelerated SPECT reconstruction with Monte Carlo-based scatter correction.

    PubMed

    Bexelius, Tobias; Sohlberg, Antti

    2018-06-01

    Statistical SPECT reconstruction can be very time-consuming especially when compensations for collimator and detector response, attenuation, and scatter are included in the reconstruction. This work proposes an accelerated SPECT reconstruction algorithm based on graphics processing unit (GPU) processing. Ordered subset expectation maximization (OSEM) algorithm with CT-based attenuation modelling, depth-dependent Gaussian convolution-based collimator-detector response modelling, and Monte Carlo-based scatter compensation was implemented using OpenCL. The OpenCL implementation was compared against the existing multi-threaded OSEM implementation running on a central processing unit (CPU) in terms of scatter-to-primary ratios, standardized uptake values (SUVs), and processing speed using mathematical phantoms and clinical multi-bed bone SPECT/CT studies. The difference in scatter-to-primary ratios, visual appearance, and SUVs between GPU and CPU implementations was minor. On the other hand, at its best, the GPU implementation was noticed to be 24 times faster than the multi-threaded CPU version on a normal 128 × 128 matrix size 3 bed bone SPECT/CT data set when compensations for collimator and detector response, attenuation, and scatter were included. GPU SPECT reconstructions show great promise as an every day clinical reconstruction tool.

  18. Floating nut for spacecraft application

    NASA Technical Reports Server (NTRS)

    Ell, L. J.; Mathewson, R. B.

    1978-01-01

    Nut overcomes mechanical mismatch from accumulated tolerances and maintains assembly even if mounting screw loosens. Nut and screws can be painted with bonding agent to insure lock. If assemblies are removed frequently, nut and screws can be made of steel to reduce wear and tear on threads and risk of faulty threads.

  19. Critical Branches and Lucky Loads in Control-Independence Architectures

    ERIC Educational Resources Information Center

    Malik, Kshitiz

    2009-01-01

    Branch mispredicts have a first-order impact on the performance of integer applications. Control Independence (CI) architectures aim to overlap the penalties of mispredicted branches with useful execution by spawning control-independent work as separate threads. Although control independent, such threads may consume register and memory values…

  20. Modeling self-organization of novel organic materials

    NASA Astrophysics Data System (ADS)

    Sayar, Mehmet

    In this thesis, the structural organization of oligomeric multi-block molecules is analyzed by computational analysis of coarse-grained models. These molecules form nanostructures with different dimensionalities, and the nanostructured nature of these materials leads to novel structural properties at different length scales. Previously, a number of oligomeric triblock rodcoil molecules have been shown to self-organize into mushroom shaped noncentrosymmetric nanostructures. Interestingly, thin films of these molecules contain polar domains and a finite macroscopic polarization. However, the fully polarized state is not the equilibrium state. In the first chapter, by solving a model with dipolar and Ising-like short range interactions, we show that polar domains are stable in films composed of aggregates as opposed to isolated molecules. Unlike classical molecular systems, these nanoaggregates have large intralayer spacings (a ≈ 6 nm), leading to a reduction in the repulsive dipolar interactions that oppose polar order within layers. This enables the formation of a striped pattern with polar domains of alternating directions. The energies of the possible structures at zero temperature are computed exactly and results of Monte Carlo simulations are provided at non-zero temperatures. In the second chapter, the macroscopic polarization of such nanostructured films is analyzed in the presence of a short range surface interaction. The surface interaction leads to a periodic domain structure where the balance between the up and down domains is broken, and therefore films of finite thickness have a net macroscopic polarization. The polarization per unit volume is a function of film thickness and strength of the surface interaction. Finally, in chapter three, self-organization of organic molecules into a network of one dimensional objects is analyzed. Multi-block organic dendron rodcoil molecules were found to self-organize into supramolecular nanoribbons (threads) and form gels at very low concentrations. Here, the formation and structural properties of these networks are studied with Monte Carlo simulations. The model gelators can form intra and inter-thread bonds, and the threads have a finite stiffness. The results suggest that the high persistence length is a result of the interplay of thread stiffness and inter-thread interactions. Furthermore, this high persistence length enables the formation of networks at low concentrations.

  1. Evolution of channel morphology in a large river subject to rectification

    NASA Astrophysics Data System (ADS)

    Scorpio, Vittoria; Mastronunzio, Marco; Proto, Matteo; Zen, Simone; Bertoldi, Walter; Prà, Elena Dai; Comiti, Francesco; Surian, Nicola; Zolezzi, Guido

    2016-04-01

    Many large rivers in Europe have been subject to heavy modifications for land reclamation and flood mitigation through centuries. As a consequence, the study of the pre-alteration morphological patterns and of the related channel evolution following the anthropic modifications is rather challenging. The Adige River is the second longest river in Italy and drains 12,100 km2 of the Eastern Italian Alps. Currently, it features a straight to sinuous pattern and an average channel width of 40-60 m. A massive rectification scheme aiming at land reclamation of the Adige valley bottom was planned in the late 18th century, and implemented starting in the first decades of 19th century. Nowadays, it can be considered one of the most altered rivers in Italy, not only due to channelization but also to the presence of many hydropower reservoirs and check-dams along its tributaries. This study aims to the reconstruction of the Adige River's evolutionary trajectory over the last 250 years, and comprehension of key control factors driving channel evolution. A multi-temporal analysis of historical maps and orthophotos from 1776, to 2006 was performed in order to assess channel modifications. In addition, land use changes at the basin scale, years of occurrence of most relevant flood events, and climate variability over the investigated period were analyzed. The detailed topographical map surveyed in 1803 was taken as a reference, and the study sector (115 km long) was divided into 39 reaches. Active channel, bars, riparian vegetation and channel control works were geo-processed. Results show that the Adige River suffered the most intense alteration from 1803 to 1855, and especially from 1847 to 1855. During this period channel narrowing ranged from 14% to 70%, coupled with pattern changes and decreases in the braiding, sinuosity and anabrancing indices. Most important alterations occurred in the reaches presenting a multi-thread morphology in 1803, as their average width declined from 220 m to 110 m. On the contrary, reaches originally sinuous remained quite stable, decreasing from 100 m to 95 m. Overall, relevant channel morphology modifications took place by 1855, when channel configuration had shifted from alternating longitudinal sequences of multi-thread and single-thread, at the beginning of the 19th century, to mainly single-thread. Total length of multi-thread reaches shifted from 31% in 1805, to 22% in 1847, to 8% in the 1855. On the contrary, sinuous and straight patterns increased from 26% (in 1803) to 62% (in 1847), up to 77% of the whole studied river length in 1855. Nevertheless, overall increases in channel braiding and mean channel width was observed downstream of the confluences with the main tributaries. Analysis of the evolutionary trajectory of channel morphology and of controlling factors, shows that human disturbances have largely prevailed over climatic influences in constraining the Adige's dynamics and morphology, mainly because of channelization causing sharp changes in channel pattern and width that occurred during the 19th century.

  2. The role of feedback mechanisms in historic channel changes of the lower Rio Grande in the Big Bend region

    NASA Astrophysics Data System (ADS)

    Dean, David J.; Schmidt, John C.

    2011-03-01

    Over the last century, large-scale water development of the upper Rio Grande in the U.S. and Mexico, and of the Rio Conchos in Mexico, has resulted in progressive channel narrowing of the lower Rio Grande in the Big Bend region. We used methods operating at multiple spatial and temporal scales to analyze the rate, magnitude, and processes responsible for channel narrowing. These methods included: hydrologic analysis of historic stream gage data, analysis of notes of measured discharges, historic oblique and aerial photograph analysis, and stratigraphic and dendrogeomorphic analysis of inset floodplain deposits. Our analyses indicate that frequent large floods between 1900 and the mid-1940s acted as a negative feedback mechanism and maintained a wide, sandy, multi-threaded river. Declines in mean and peak flow in the mid-1940s resulted in progressive channel narrowing. Channel narrowing has been temporarily interrupted by occasional large floods that widened the channel, however, channel narrowing has always resumed. After large floods in 1990 and 1991, the active channel width of the lower Rio Grande has narrowed by 36-52%. Narrowing has occurred by the vertical accretion of fine-grained deposits on top of sand and gravel bars, inset within natural levees. Channel narrowing by vertical accretion occurred simultaneously with a rapid invasion of non-native riparian vegetation ( Tamarix spp., Arundo donax) which created a positive feedback and exacerbated the processes of channel narrowing and vertical accretion. In two floodplain trenches, we measured 2.75 and 3.5 m of vertical accretion between 1993 and 2008. In some localities, nearly 90% of bare, active channel bars were converted to vegetated floodplain during the same period. Upward shifts of stage-discharge relations occurred resulting in over-bank flooding at lower discharges, and continued vertical accretion despite a progressive reduction in stream flow. Thus, although the magnitude of the average annual flood was reduced between 40 and 50%, over-bank flooding continued. These changes reflect a shift in the geomorphic nature of the Rio Grande from a wide, laterally unstable, multi-thread river, to a laterally stable, single-thread channel with cohesive, vertical banks, and few active in-channel bars.

  3. Fast access to the CMS detector condition data employing HTML5 technologies

    NASA Astrophysics Data System (ADS)

    Pierro, Giuseppe Antonio; Cavallari, Francesca; Di Guida, Salvatore; Innocente, Vincenzo

    2011-12-01

    This paper focuses on using HTML version 5 (HTML5) for accessing condition data for the CMS experiment, evaluating the benefits and risks posed by the use of this technology. According to the authors of HTML5, this technology attempts to solve issues found in previous iterations of HTML and addresses the needs of web applications, an area previously not adequately covered by HTML. We demonstrate that employing HTML5 brings important benefits in terms of access performance to the CMS condition data. The combined use of web storage and web sockets allows increasing the performance and reducing the costs in term of computation power, memory usage and network bandwidth for client and server. Above all, the web workers allow creating different scripts that can be executed using multi-thread mode, exploiting multi-core microprocessors. Web workers have been employed in order to substantially decrease the web page rendering time to display the condition data stored in the CMS condition database.

  4. Model Checking JAVA Programs Using Java Pathfinder

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Pressburger, Thomas

    2000-01-01

    This paper describes a translator called JAVA PATHFINDER from JAVA to PROMELA, the "programming language" of the SPIN model checker. The purpose is to establish a framework for verification and debugging of JAVA programs based on model checking. This work should be seen in a broader attempt to make formal methods applicable "in the loop" of programming within NASA's areas such as space, aviation, and robotics. Our main goal is to create automated formal methods such that programmers themselves can apply these in their daily work (in the loop) without the need for specialists to manually reformulate a program into a different notation in order to analyze the program. This work is a continuation of an effort to formally verify, using SPIN, a multi-threaded operating system programmed in Lisp for the Deep-Space 1 spacecraft, and of previous work in applying existing model checkers and theorem provers to real applications.

  5. Pili and flagella biology, structure, and biotechnological applications.

    PubMed

    Van Gerven, Nani; Waksman, Gabriel; Remaut, Han

    2011-01-01

    Bacteria and Archaea expose on their outer surfaces a variety of thread-like proteinaceous organelles with which they interact with their environments. These structures are repetitive assemblies of covalently or non-covalently linked protein subunits, organized into filamentous polymers known as pili ("hair"), flagella ("whips") or injectisomes ("needles"). They serve different roles in cell motility, adhesion and host invasion, protein and DNA secretion and uptake, conductance, or cellular encapsulation. Here we describe the functional, morphological and genetic diversity of these bacterial filamentous protein structures. The organized, multi-copy build-up and/or the natural function of pili and flagella have lead to their biotechnological application as display and secretion tools, as therapeutic targets or as molecular motors. We review the documented and potential technological exploitation of bacterial surface filaments in light of their structural and functional traits. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. FOX: A Fault-Oblivious Extreme-Scale Execution Environment Boston University Final Report Project Number: DE-SC0005365

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Appavoo, Jonathan

    Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. The FOX project explored systems software and runtime support for a new approach to the data and work distribution for fault oblivious application execution. Our major OS work at Boston University focusedmore » on developing a new light-weight operating systems model that provides an appropriate context for both multi-core and multi-node application development. This work is discussed in section 1. Early on in the FOX project BU developed infrastructure for prototyping dynamic HPC environments in which the sets of nodes that an application is run on can be dynamically grown or shrunk. This work was an extension of the Kittyhawk project and is discussed in section 2. Section 3 documents the publications and software repositories that we have produced. To put our work in context of the complete FOX project contribution we include in section 4 an extended version of a paper that documents the complete work of the FOX team.« less

  7. JaSTA-2: Second version of the Java Superposition T-matrix Application

    NASA Astrophysics Data System (ADS)

    Halder, Prithish; Das, Himadri Sekhar

    2017-12-01

    In this article, we announce the development of a new version of the Java Superposition T-matrix App (JaSTA-2), to study the light scattering properties of porous aggregate particles. It has been developed using Netbeans 7.1.2, which is a java integrated development environment (IDE). The JaSTA uses double precision superposition T-matrix codes for multi-sphere clusters in random orientation, developed by Mackowski and Mischenko (1996). The new version consists of two options as part of the input parameters: (i) single wavelength and (ii) multiple wavelengths. The first option (which retains the applicability of older version of JaSTA) calculates the light scattering properties of aggregates of spheres for a single wavelength at a given instant of time whereas the second option can execute the code for a multiple numbers of wavelengths in a single run. JaSTA-2 provides convenient and quicker data analysis which can be used in diverse fields like Planetary Science, Atmospheric Physics, Nanoscience, etc. This version of the software is developed for Linux platform only, and it can be operated over all the cores of a processor using the multi-threading option.

  8. Multi-phase SPH modelling of violent hydrodynamics on GPUs

    NASA Astrophysics Data System (ADS)

    Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.

    2015-11-01

    This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fukuoka, T.

    Many studies have been devoted to investigate how the maximum stress occurring in the bolted joint could be reduced. Patterson and Kenny suggest that a modified nut with a straight bevel at the bearing surface is effective. However, they only dealt with M30, and estimations on the nut geometry had not been necessarily sufficient. In this study, an extensive finite element approach for solving general multi-body contact problem is proposed by incorporating a regularization method into stiffness matrices with singularity involved; thus, numerical analyses are executed to accurately determine the optimal shape of the modified nut for various design factors.more » A modified nut with a curved bevel is also treated, and it is concluded that the modified nuts are significantly effective for bolts with larger nominal diameter and fine pitch, and are practically useful compared to pitch modification and tapered thread methods.« less

  10. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  11. The signal extraction of fetal heart rate based on wavelet transform and BP neural network

    NASA Astrophysics Data System (ADS)

    Yang, Xiao Hong; Zhang, Bang-Cheng; Fu, Hu Dai

    2005-04-01

    This paper briefly introduces the collection and recognition of bio-medical signals, designs the method to collect FM signals. A detailed discussion on the system hardware, structure and functions is also given. Under LabWindows/CVI,the hardware and the driver do compatible, the hardware equipment work properly actively. The paper adopts multi threading technology for real-time analysis and makes use of latency time of CPU effectively, expedites program reflect speed, improves the program to perform efficiency. One threading is collecting data; the other threading is analyzing data. Using the method, it is broaden to analyze the signal in real-time. Wavelet transform to remove the main interference in the FM and by adding time-window to recognize with BP network; Finally the results of collecting signals and BP networks are discussed. 8 pregnant women's signals of FM were collected successfully by using the sensor. The correctness rate of BP network recognition is about 83.3% by using the above measure.

  12. Java PathFinder User Guide

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus

    1999-01-01

    The JAVA PATHFINDER, JPF, is a translator from a subset of JAVA 1.0 to PROMELA, the programming language of the SPIN model checker. The purpose of JPF is to establish a framework for verification and debugging of JAVA programming based on model checking. The main goal is to automate program verification such that a programmer can apply it in the daily work without the need for a specialist to manually reformulate a program into a different notation in order to analyze the program. The system is especially suited for analyzing multi-threaded JAVA applications, where normal testing usually falls short. The system can find deadlocks and violations of boolean assertions stated by the programmer in a special assertion language. This document explains how to Use JPF.

  13. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation

    PubMed Central

    Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B.; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications. PMID:28545077

  14. The Regional Hydrologic Extremes Assessment System: A software framework for hydrologic modeling and data assimilation.

    PubMed

    Andreadis, Konstantinos M; Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali

    2017-01-01

    The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications.

  15. Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

    NASA Astrophysics Data System (ADS)

    Yi, Hongsuk

    2014-03-01

    Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.

  16. ASC-ATDM Performance Portability Requirements for 2015-2019

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Harold C.; Trott, Christian Robert

    This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less

  17. Clomp

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gylenhaal, J.; Bronevetsky, G.

    2007-05-25

    CLOMP is the C version of the Livermore OpenMP benchmark deeloped to measure OpenMP overheads and other performance impacts due to threading (like NUMA memory layouts, memory contention, cache effects, etc.) in order to influence future system design. Current best-in-class implementations of OpenMP have overheads at least ten times larger than is required by many of our applications for effective use of OpenMP. This benchmark shows the significant negative performance impact of these relatively large overheads and of other thread effects. The CLOMP benchmark highly configurable to allow a variety of problem sizes and threading effects to be studied andmore » it carefully checks its results to catch many common threading errors. This benchmark is expected to be included as part of the Sequoia Benchmark suite for the Sequoia procurement.« less

  18. Linking consistency with object/thread semantics - An approach to robust computation

    NASA Technical Reports Server (NTRS)

    Chen, Raymond C.; Dasgupta, Partha

    1989-01-01

    This paper presents an object/thread based paradigm that links data consistency with object/thread semantics. The paradigm can be used to achieve a wide range of consistency semantics from strict atomic transactions to standard process semantics. The paradigm supports three types of data consistency. Object programmers indicate the type of consistency desired on a per-operation basis and the system performs automatic concurrency control and recovery management to ensure that those consistency requirements are met. This allows programmers to customize consistency and recovery on a per-application basis without having to supply complicated, custom recovery management schemes. The paradigm allows robust and nonrobust computation to operate concurrently on the same data in a well defined manner. The operating system needs to support only one vehicle of computation - the thread.

  19. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  20. CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

    PubMed Central

    Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.

    2011-01-01

    As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404

  1. CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.

    PubMed

    Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W

    2012-06-01

    As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  2. Center for Technology for Advanced Scientific Componet Software (TASCS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Govindaraju, Madhusudhan

    Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less

  3. Biomechanical evaluation of dental implants with three different designs: Removal torque and resonance frequency analysis in rabbits.

    PubMed

    Gehrke, Sergio Alexandre; Marin, Giovanni Wiel

    2015-05-01

    The objective of this study was to investigate the effect of implant design on stability and resistance to reverse torque in the tibia of rabbits. Three test groups were prepared using the different characteristics of each implant model: square threads with progressive depth to the apex, a cervical portion without threads and a self-tapping system that is quite pronounced and aggressive (Group 1); triangular threads with flat tips with increasing thread depth from the cervical portion to the apex and a small self-tapping portion with a short thread pitch (Group 2); long thread pitch, progressive thread depth, an apical area with a small self-tapping portion (Group 3). For the two last groups, a final single-use drill was provided for each implant. Nine rabbits received 54 conical implants with a same surface treatment. The resonance frequency was analysed four times (0, 6, 8 and 12 weeks), and removal torque values were measured at three time intervals after the implantations (6, 8 and 12 weeks). In comparing the implant stability quotient at the four time points, highly significant statistic differences were found (p = 1.29(-10)). The reverse torque at the three time points was also significantly different among the groups (p = 0.00015). The implants of Group 2, with seemingly less aggressive design, more quickly reached high values of stability and removal torque. Under the limitations of this study, however, it is possible that in cases in which there may be low osseointegration response, the implant design should be evaluated. Copyright © 2014 Elsevier GmbH. All rights reserved.

  4. Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

    NASA Astrophysics Data System (ADS)

    Hadade, Ioan; di Mare, Luca

    2016-08-01

    Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.

  5. Spider-web inspired multi-resolution graphene tactile sensor.

    PubMed

    Liu, Lu; Huang, Yu; Li, Fengyu; Ma, Ying; Li, Wenbo; Su, Meng; Qian, Xin; Ren, Wanjie; Tang, Kanglai; Song, Yanlin

    2018-05-08

    Multi-dimensional accurate response and smooth signal transmission are critical challenges in the advancement of multi-resolution recognition and complex environment analysis. Inspired by the structure-activity relationship between discrepant microstructures of the spiral and radial threads in a spider web, we designed and printed graphene with porous and densely-packed microstructures to integrate into a multi-resolution graphene tactile sensor. The three-dimensional (3D) porous graphene structure performs multi-dimensional deformation responses. The laminar densely-packed graphene structure contributes excellent conductivity with flexible stability. The spider-web inspired printed pattern inherits orientational and locational kinesis tracking. The multi-structure construction with homo-graphene material can integrate discrepant electronic properties with remarkable flexibility, which will attract enormous attention for electronic skin, wearable devices and human-machine interactions.

  6. Assessing the Effect of Dental Implants Thread Design on Distribution of Stress in Impact Loadings Using Three Dimensional Finite Element Method

    PubMed Central

    I, Zarei; S, Khajehpour; A, Sabouri; AZ, Haghnegahdar; K, Jafari

    2016-01-01

    Statement of Problem: Impacts and accidents are considered as the main fac- tors in losing the teeth, so the analysis and design of the implants that they can be more resistant against impacts is very important. One of the important nu- merical methods having widespread application in various fields of engineering sciences is the finite element method. Among its wide applications, the study of distribution of power in complex structures can be noted. Objectives: The aim of this research was to assess the geometric effect and the type of implant thread on its performance; we also made an attempt to determine the created stress using finite element method. Materials and Methods: In this study, the three dimensional model of bone by using Cone Beam Computerized Tomography (CBCT) of the patient has been provided. The implants in this study are designed by Solid Works software. Loading is simulated in explicit dynamic, by struck of a rigid body with the speed of 1 mm/s to implant vertically and horizontally; and the maximum level of induced stress for the cortical and trabecular bone in the ANSYS Workbench software was calculated. Results: By considering the results of this study, it was identified that, among the designed samples, the maximum imposed stress in the cortical bone layer occurred in the first group (straight threads) and the maximum stress value in the trabecular bone layer and implant occurred in the second group (tapered threads). Conclusions: Due to the limitations of this study, the implants with more depth thread, because of the increased contact surface of the implant with the bone, caused more stability; also, the implant with smaller thread and shorter pitch length caused more stress to the bone. PMID:28959748

  7. Assessing the Effect of Dental Implants Thread Design on Distribution of Stress in Impact Loadings Using Three Dimensional Finite Element Method.

    PubMed

    I, Zarei; S, Khajehpour; A, Sabouri; Az, Haghnegahdar; K, Jafari

    2016-06-01

    Impacts and accidents are considered as the main fac- tors in losing the teeth, so the analysis and design of the implants that they can be more resistant against impacts is very important. One of the important nu- merical methods having widespread application in various fields of engineering sciences is the finite element method. Among its wide applications, the study of distribution of power in complex structures can be noted. The aim of this research was to assess the geometric effect and the type of implant thread on its performance; we also made an attempt to determine the created stress using finite element method. In this study, the three dimensional model of bone by using Cone Beam Computerized Tomography (CBCT) of the patient has been provided. The implants in this study are designed by Solid Works software. Loading is simulated in explicit dynamic, by struck of a rigid body with the speed of 1 mm/s to implant vertically and horizontally; and the maximum level of induced stress for the cortical and trabecular bone in the ANSYS Workbench software was calculated. By considering the results of this study, it was identified that, among the designed samples, the maximum imposed stress in the cortical bone layer occurred in the first group (straight threads) and the maximum stress value in the trabecular bone layer and implant occurred in the second group (tapered threads). Due to the limitations of this study, the implants with more depth thread, because of the increased contact surface of the implant with the bone, caused more stability; also, the implant with smaller thread and shorter pitch length caused more stress to the bone.

  8. Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

    PubMed

    Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.

  9. High Efficiency Carbon Nanotube Thread Antennas

    NASA Astrophysics Data System (ADS)

    Bengio, Elie; Senic, Damir; Taylor, Lauren; Tsentalovich, Dmitri; Chen, Peiyu; Holloway, Christopher; Novotny, David; Babakhani, Aydin; Long, Christopher; Booth, James; Orloff, Nathan; Pasquali, Matteo

    Although previous research has explored the underlying theory of high-frequency behavior of carbon nanotubes (CNTs) and CNT bundles for antennas, there is a gap in the literature for direct experimental measurements of radiation efficiency. Here we report a novel measurement technique to accurately characterize the radiation efficiency of quarter-wavelength monopole antennas made from CNT thread. At medical device (1 GHz) and Wi-Fi (2.4 GHz) frequencies, we measured the highest absolute values of radiation efficiency in the literature for CNT antennas, matching that of copper wire. We also report the first direct experimental observation that, contrary to metals, the radiation efficiency of the CNT thread improves significantly at higher frequencies. These results pave the way for practical applications of CNT thread antennas, particularly in the aerospace and wearable electronics industries where weight saving is a priority.

  10. What Scientific Applications can Benefit from Hardware Transactional Memory?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schindewolf, M; Bihari, B; Gyllenhaal, J

    2012-06-04

    Achieving efficient and correct synchronization of multiple threads is a difficult and error-prone task at small scale and, as we march towards extreme scale computing, will be even more challenging when the resulting application is supposed to utilize millions of cores efficiently. Transactional Memory (TM) is a promising technique to ease the burden on the programmer, but only recently has become available on commercial hardware in the new Blue Gene/Q system and hence the real benefit for realistic applications has not been studied, yet. This paper presents the first performance results of TM embedded into OpenMP on a prototype systemmore » of BG/Q and characterizes code properties that will likely lead to benefits when augmented with TM primitives. We first, study the influence of thread count, environment variables and memory layout on TM performance and identify code properties that will yield performance gains with TM. Second, we evaluate the combination of OpenMP with multiple synchronization primitives on top of MPI to determine suitable task to thread ratios per node. Finally, we condense our findings into a set of best practices. These are applied to a Monte Carlo Benchmark and a Smoothed Particle Hydrodynamics method. In both cases an optimized TM version, executed with 64 threads on one node, outperforms a simple TM implementation. MCB with optimized TM yields a speedup of 27.45 over baseline.« less

  11. Heterogeneous Concurrent Modeling and Design in Java (Volume 3: Ptolemy II Domains)

    DTIC Science & Technology

    2008-04-15

    Starting the model 88 6.5.3. Atomic Communication in Concurrent Execution 90 6.5.4. Detecting Deadlocks: 90 6.6. Application to Resource Management 90 6.6.1...Resource Management Demo 90 6.6.2. ResourcePool 91 6.7. Threads in an Actor 91 6.7.1. Creating Extra Threads in an Actor 91 6.7.2. Manually Blocking...Local Time Management 117 8.4.2. Detecting Deadlock 118 8.4.3. Ending Execution 118 8.5. Example DDE Applications 119 9. PN Domain 121 9.1

  12. Benchmark and Framework for Encouraging Research on Multi-Threaded Testing Tools

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Stoller, Scott D.; Ur, Shmuel

    2003-01-01

    A problem that has been getting prominence in testing is that of looking for intermittent bugs. Multi-threaded code is becoming very common, mostly on the server side. As there is no silver bullet solution, research focuses on a variety of partial solutions. In this paper (invited by PADTAD 2003) we outline a proposed project to facilitate research. The project goals are as follows. The first goal is to create a benchmark that can be used to evaluate different solutions. The benchmark, apart from containing programs with documented bugs, will include other artifacts, such as traces, that are useful for evaluating some of the technologies. The second goal is to create a set of tools with open API s that can be used to check ideas without building a large system. For example an instrumentor will be available, that could be used to test temporal noise making heuristics. The third goal is to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques, could congregate.

  13. Observation and modelling of the Fe XXI line profile observed by IRIS during the impulsive phase of flares

    NASA Astrophysics Data System (ADS)

    Polito, V.; Testa, P.; De Pontieu, B.; Allred, J. C.

    2017-12-01

    The observation of the high temperature (above 10 MK) Fe XXI 1354.1 A line with the Interface Region Imaging Spectrograph (IRIS) has provided significant insights into the chromospheric evaporation process in flares. In particular, the line is often observed to be completely blueshifted, in contrast to previous observations at lower spatial and spectral resolution, and in agreement with predictions from theoretical models. Interestingly, the line is also observed to be mostly symmetric and with a large excess above the thermal width. One popular interpretation for the excess broadening is given by assuming a superposition of flows from different loop strands. In this work, we perform a statistical analysis of Fe XXI line profiles observed by IRIS during the impulsive phase of flares and compare our results with hydrodynamic simulations of multi-thread flare loops performed with the 1D RADYN code. Our results indicate that the multi-thread models cannot easily reproduce the symmetry of the line and that some other physical process might need to be invoked in order to explain the observed profiles.

  14. Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

    2014-09-01

    Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.

  15. Effects of Thread Depth in the Neck Area on Peri-Implant Hard and Soft Tissues: An Animal Study.

    PubMed

    Sun, Shan-Pao; Lee, Dong-Won; Yun, Jeong-Ho; Park, Kwang-Ho; Park, Kwang-Bum; Moon, Ik-Sang

    2016-11-01

    Implants with deep thread depth have been developed for the purpose of increasing total implant surface area. However, effects of implant thread depth remain controversial. The aim of this study is to examine effects of thread depth on peri-implant tissues in terms of bone-implant contact (BIC), bone-implant volume (BIV), and hard and soft tissue dimensions using comprehensive analyses, including microcomputed tomography (micro-CT). Five beagle dogs received experimental intramandibular implants 3 months after removal of their premolars and first molars (P 2 , P 3 , P 4 , and M 1 ). Two different types of implants were installed in each animal: deep threaded (DT) and shallow threaded (ST). Resonance frequency testing was performed on the day of implantation as well as 4 and 8 weeks after implantation. Intraoral radiography, micro-CT, and histomorphometry were used to evaluate peri-implant tissues 4 and 8 weeks after implantation. There were no significant differences in resonance frequency test results between the two groups. Although radiographic analysis showed no group differences, micro-CT (P = 0.01) and histomorphometry (P = 0.003) revealed the DT group had significantly lower BIC values than the ST group at 4 weeks. However, by 8 weeks, BIC values of the two groups did not differ significantly. No significant differences in BIV or soft tissue height were observed between the two groups at either time point. DT implants showed no benefits over ST implants when inserted in dog mandibles.

  16. The asymptotic structure of a slender coiling fluid thread

    NASA Astrophysics Data System (ADS)

    Blount, Maurice; Lister, John

    2010-11-01

    The buckling of a viscous fluid thread as it falls through air onto a stationary surface is a well-known breakfast-time phenomenon which exhibits a rich variety of dynamical regimes [1]. Since the bending resistance of a slender thread is small, bending motion is largely confined to a short region of coiling near the surface. If the height of fall is large enough, then the thread above the coiling region forms a `tail' that falls nearly vertically under gravity but is deflected slightly due to forces exerted on it by the coil. Although it is possible to use force balances in the coil to estimate scalings for the coiling frequency, we analyse the solution structure of the entire thread in the asymptotic limit of a very slender thread and thereby include the dynamic interaction between the coil and the tail. Quantitative predictions of the coiling frequency are obtained which demonstrate the existence of leading-order corrections to scalings previously derived. In particular, we show that in the regime where the deflection of the tail is governed by a balance between centrifugal acceleration, hoop stress and gravity, the tail behaves as a flexible circular pendulum that is forced by bending stress exerted by the coil. The amplitude of the response is calculated and the previously observed resonance when the coiling frequency coincides with one of the eigenfrequencies of a free flexible pendulum is thereby explained. [1] N.M. Ribe et al., J. Fluid Mech. 555, 275-297.

  17. Novel gold nanoparticle trimer reporter probe combined with dry-reagent cotton thread immunoassay device for rapid human ferritin test.

    PubMed

    Mao, Xun; Du, Ting-E; Meng, Lili; Song, Tingting

    2015-08-19

    We reported here for the first time on the use of cotton thread combined with novel gold nanoparticle trimer reporter probe for low-cost, sensitive and rapid detection of a lung cancer related biomarker, human ferritin. A model system comprising ferritin as an analyte and a pair of monoclonal antibodies was used to demonstrate the proof-of-concept on the dry-reagent natural cotton thread immunoassay device. Results indicated that the using of novel gold nanoparticle trimer reporter probe greatly improved the sensitivity comparing with traditional gold nanoparticle reporter probe on the cotton thread immunoassay device. The assay avoids multiple incubation and washing steps performed in most conventional protein analyses. Although qualitative tests are realized by observing the color change of the test zone, quantitative data are obtained by recording the optical responses of the test zone with a commercial scanner and corresponding analysis software. Under optimal conditions, the cotton thread immunoassay device was capable of measuring 10 ng/mL human ferritin under room temperature which is sensitive enough for clinical diagnosis. Moreover, the sample solution employed in the assays is just 8 μL, which is much less than traditional lateral flow strip based biosensors. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Preload, Coefficient of Friction, and Thread Friction in an Implant-Abutment-Screw Complex.

    PubMed

    Wentaschek, Stefan; Tomalla, Sven; Schmidtmann, Irene; Lehmann, Karl Martin

    To examine the screw preload, coefficient of friction (COF), and tightening torque needed to overcome the thread friction of an implant-abutment-screw complex. In a customized load frame, 25 new implant-abutment-screw complexes including uncoated titanium alloy screws were torqued and untorqued 10 times each, applying 25 Ncm. Mean preload values decreased significantly from 209.8 N to 129.5 N according to the number of repetitions. The overall COF increased correspondingly. There was no comparable trend for the thread friction component. These results suggest that the application of a used implant-abutment-screw complex may be unfavorable for obtaining optimal screw preload.

  19. Variation tolerant SoC design

    NASA Astrophysics Data System (ADS)

    Kozhikkottu, Vivek J.

    The scaling of integrated circuits into the nanometer regime has led to variations emerging as a primary concern for designers of integrated circuits. Variations are an inevitable consequence of the semiconductor manufacturing process, and also arise due to the side-effects of operation of integrated circuits (voltage, temperature, and aging). Conventional design approaches, which are based on design corners or worst-case scenarios, leave designers with an undesirable choice between the considerable overheads associated with over-design and significantly reduced manufacturing yield. Techniques for variation-tolerant design at the logic, circuit and layout levels of the design process have been developed and are in commercial use. However, with the incessant increase in variations due to technology scaling and design trends such as near-threshold computing, these techniques are no longer sufficient to contain the effects of variations, and there is a need to address variations at all stages of design. This thesis addresses the problem of variation-tolerant design at the earliest stages of the design process, where the system-level design decisions that are made can have a very significant impact. There are two key aspects to making system-level design variation-aware. First, analysis techniques must be developed to project the impact of variations on system-level metrics such as application performance and energy. Second, variation-tolerant design techniques need to be developed to absorb the residual impact of variations (that cannot be contained through lower-level techniques). In this thesis, we address both these facets by developing robust and scalable variation-aware analysis and variation mitigation techniques at the system level. The first contribution of this thesis is a variation-aware system-level performance analysis framework. We address the key challenge of translating the per-component clock frequency distributions into a system-level application performance distribution. This task is particularly complex and challenging due to the inter-dependencies between components' execution, indirect effects of shared resources, and interactions between multiple system-level "execution paths". We argue that accurate variation-aware performance analysis requires Monte-Carlo based repeated system execution. Our proposed analysis framework leverages emulation to significantly speedup performance analysis without sacrificing the generality and accuracy achieved by Monte-Carlo based simulations. Our experiments show performance improvements of around 60x compared to state-of-the-art hardware-software co-simulation tools and also underscore the framework's potential to enable variation-aware design and exploration at the system level. Our second contribution addresses the problem of designing variation-tolerant SoCs using recovery based design, a popular circuit design paradigm that addresses variations by eliminating guard-bands and operating circuits at close to "zero margins" while detecting and recovering from timing errors. While previous efforts have demonstrated the potential benefits of recovery based design, we identify several challenges that need to be addressed in order to apply this technique to SoCs. We present a systematic design framework to apply recovery based design at the system level. We propose to partition SoCs into "recovery islands", wherein each recovery island consists of one or more SoC components that can recover independent of the rest of the SoC. We present a variation-aware design methodology that partitions a given SoC into recovery islands and computes the optimal operating points for each island, taking into account the various trade-offs involved. Our experiments demonstrate that the proposed design framework achieves an average of 32% energy savings over conventional worst-case designs, with negligible losses in performance. The third contribution of this thesis introduces disproportionate allocation of shared system resources as a means to combat the adverse impact of within-die variations on multi-core platforms. For multi-threaded programs executing on variation-impacted multi-cores platforms, we make the key observation that thread performance is not only a function of the frequency of the core on which it is executing on, but also depends upon the amount of shared system resources allocated to it. We utilize this insight to design a variation-aware runtime scheme which allocates the ways of a last-level shared L2 cache amongst the different cores/threads of a multi-core platform taking into account both application characteristics as well as chip specific variation profiles. Our experiments on 100 quad-core chips, each with a distinct variation profile, shows on an average 15% performance improvements for a suite of multi-threaded benchmarks. Our final contribution investigates the variation-tolerant design of domain-specific accelerators and demonstrates how the unique architectural properties of these accelerators can be leveraged to create highly effective variation tolerance mechanisms. We explore this concept through the variation-tolerant design of a vector processor that efficiently executes applications from the domains of recognition, mining and synthesis (RMS). We develop a novel design approach for variation tolerance, which leverages the unique nature of the vector reduction operations performed by this processor to effectively predict and preempt the occurrence of timing errors under variations and subsequently restore the correct output at the end of each vector reduction operation. We implement the above predict, preempt and restore operations by suitably enhancing the processor hardware and the application software and demonstrate considerable energy benefits (on an average 32%) across six applications from the domains of RMS. In conclusion, our work provides system designers with powerful tools and mechanisms in their efforts to combat variations, resulting in improved designer productivity and variation-tolerant systems.

  20. Influence of multi-cycle loading on the structure and mechanics of marine mussel plaques.

    PubMed

    Wilhelm, Menaka H; Filippidi, Emmanouela; Waite, J Herbert; Valentine, Megan T

    2017-10-18

    The proteinaceous byssal plaque-thread structures created by marine mussels exhibit extraordinary load-bearing capability. Although the nanoscopic protein interactions that support interfacial adhesion are increasingly understood, major mechanistic questions about how mussel plaques maintain toughness on supramolecular scales remain unanswered. This study explores the mechanical properties of whole mussel plaques subjected to repetitive loading cycles, with varied recovery times. Mechanical measurements were complemented with scanning electron microscopy to investigate strain-induced structural changes after yield. Multicyclic loading of plaques decreases their low-strain stiffness and introduces irreversible, strain-dependent plastic damage within the plaque microstructure. However, strain history does not compromise critical strength or maximum extension compared with plaques monotonically loaded to failure. These results suggest that a multiplicity of force transfer mechanisms between the thread and plaque-substrate interface allow the plaque-thread structure to accommodate a wide range of extensions as it continues to bear load. This improved understanding of the mussel system at micron-to-millimeter lengthscales offers strategies for including similar fail-safe mechanisms in the design of soft, tough and resilient synthetic structures.

  1. Research into a distributed fault diagnosis system and its application

    NASA Astrophysics Data System (ADS)

    Qian, Suxiang; Jiao, Weidong; Lou, Yongjian; Shen, Xiaomei

    2005-12-01

    CORBA (Common Object Request Broker Architecture) is a solution to distributed computing methods over heterogeneity systems, which establishes a communication protocol between distributed objects. It takes great emphasis on realizing the interoperation between distributed objects. However, only after developing some application approaches and some practical technology in monitoring and diagnosis, can the customers share the monitoring and diagnosis information, so that the purpose of realizing remote multi-expert cooperation diagnosis online can be achieved. This paper aims at building an open fault monitoring and diagnosis platform combining CORBA, Web and agent. Heterogeneity diagnosis object interoperate in independent thread through the CORBA (soft-bus), realizing sharing resource and multi-expert cooperation diagnosis online, solving the disadvantage such as lack of diagnosis knowledge, oneness of diagnosis technique and imperfectness of analysis function, so that more complicated and further diagnosis can be carried on. Take high-speed centrifugal air compressor set for example, we demonstrate a distributed diagnosis based on CORBA. It proves that we can find out more efficient approaches to settle the problems such as real-time monitoring and diagnosis on the net and the break-up of complicated tasks, inosculating CORBA, Web technique and agent frame model to carry on complemental research. In this system, Multi-diagnosis Intelligent Agent helps improve diagnosis efficiency. Besides, this system offers an open circumstances, which is easy for the diagnosis objects to upgrade and for new diagnosis server objects to join in.

  2. A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.

    PubMed

    Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary

    2017-12-01

    Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.

  3. Thread-like supercapacitors based on one-step spun nanocomposite yarns.

    PubMed

    Meng, Qinghai; Wang, Kai; Guo, Wei; Fang, Jin; Wei, Zhixiang; She, Xilin

    2014-08-13

    Thread-like electronic devices have attracted great interest because of their potential applications in wearable electronics. To produce high-performance, thread-like supercapacitors, a mixture of stable dispersions of single-walled carbon nanotubes and conducting polyaniline nanowires are prepared. Then, the mixture is spun into flexible yarns with a polyvinyl alcohol outer sheath by a one-step spinning process. The composite yarns show excellent mechanical properties and high electrical conductivities after sufficient washing to remove surfactants. After applying a further coating layer of gel electrolyte, two flexible yarns are twisted together to form a thread-like supercapacitor. The supercapacitor based on these two yarns (SWCNTs and PAniNWs) possesses a much higher specific capacitance than that based only on pure SWCNTs yarns, making it an ideal energy-storage device for wearable electronics. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. "How Did We Get Here?": Topic Drift in Online Health Discussions.

    PubMed

    Park, Albert; Hartzler, Andrea L; Huh, Jina; Hsieh, Gary; McDonald, David W; Pratt, Wanda

    2016-11-02

    Patients increasingly use online health communities to exchange health information and peer support. During the progression of health discussions, a change of topic-topic drift-can occur. Topic drift is a frequent phenomenon linked to incoherence and frustration in online communities and other forms of computer-mediated communication. For sensitive topics, such as health, such drift could have life-altering repercussions, yet topic drift has not been studied in these contexts. Our goals were to understand topic drift in online health communities and then to develop and evaluate an automated approach to detect both topic drift and efforts of community members to counteract such drift. We manually analyzed 721 posts from 184 threads from 7 online health communities within WebMD to understand topic drift, members' reaction towards topic drift, and their efforts to counteract topic drift. Then, we developed an automated approach to detect topic drift and counteraction efforts. We detected topic drift by calculating cosine similarity between 229,156 posts from 37,805 threads and measuring change of cosine similarity scores from the threads' first posts to their sequential posts. Using a similar approach, we detected counteractions to topic drift in threads by focusing on the irregular increase of similarity scores compared to the previous post in threads. Finally, we evaluated the performance of our automated approaches to detect topic drift and counteracting efforts by using a manually developed gold standard. Our qualitative analyses revealed that in threads of online health communities, topics change gradually, but usually stay within the global frame of topics for the specific community. Members showed frustration when topic drift occurred in the middle of threads but reacted positively to off-topic stories shared as separate threads. Although all types of members helped to counteract topic drift, original posters provided the most effort to keep threads on topic. Cosine similarity scores show promise for automatically detecting topical changes in online health discussions. In our manual evaluation, we achieved an F1 score of .71 and .73 for detecting topic drift and counteracting efforts to stay on topic, respectively. Our analyses expand our understanding of topic drift in a health context and highlight practical implications, such as promoting off-topic discussions as a function of building rapport in online health communities. Furthermore, the quantitative findings suggest that an automated tool could help detect topic drift, support counteraction efforts to bring the conversation back on topic, and improve communication in these important communities. Findings from this study have the potential to reduce topic drift and improve online health community members' experience of computer-mediated communication. Improved communication could enhance the personal health management of members who seek essential information and support during times of difficulty. ©Albert Park, Andrea L Hartzler, Jina Huh, Gary Hsieh, David W McDonald, Wanda Pratt. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 02.11.2016.

  5. Fast parallel algorithm for slicing STL based on pipeline

    NASA Astrophysics Data System (ADS)

    Ma, Xulong; Lin, Feng; Yao, Bo

    2016-05-01

    In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.

  6. Performance of an adjustable, threaded inertance tube

    NASA Astrophysics Data System (ADS)

    Zhou, W. J.; Pfotenhauer, J. M.; Nellis, G. F.; Liu, S. Y.

    2015-12-01

    The performance of the Stirling type pulse tube cryocooler depends strongly on the design of the inertance tube. The phase angle produced by the inertance tube is very sensitive to its diameter and length. Recent developments are reported here regarding an adjustable inertance device that can be adjusted in real time. The inertance passage is formed by the root of a concentric cylindrical threaded device. The depth of the threads installed on the outer screw varies. In this device, the outer screw can be rotated four and half turns. At the zero turn position the length of the passage is 1.74 m and the hydraulic diameter is 7 mm. By rotating the outer screw, the inner threaded rod engages with additional, larger depth threads. Therefore, at its upper limit of rotation, the inertance passage includes both the original 1.74 m length with 7mm hydraulic diameter plus an additional 1.86 m length with a 10 mm hydraulic diameter. A phase shift change of 24° has been experimentally measured by changing the position of outer screw while operating the device at a frequency of 60 Hz. This phase angle shift is less than the theoretically predicted value due to the presence of a relatively large leak through the thread clearance. Therefore, the distributed component model of the inertance tube was modified to account for the leak path causing the data to agree with the model. Further, the application of vacuum grease to the threads causes the performance of the device to improve substantially.

  7. Dynamics of a multi-thermal loop in the solar corona

    NASA Astrophysics Data System (ADS)

    Nisticò, G.; Anfinogentov, S.; Nakariakov, V. M.

    2014-10-01

    Context. We present an observation of a long-living multi-thermal coronal loop, visible in different extreme ultra-violet wavebands of SDO/AIA in a quiet-Sun region close to the western solar limb. Aims: Analysis of persistent kink displacements of the loop seen in different bandpasses that correspond to different temperatures of the plasma allows sub-resolution structuring of the loop to be revealed. Methods: A vertically oriented slit is taken at the loop top, and time-distance maps are made from it. Loop displacements in time-distance maps are automatically tracked with the Gaussian fitting technique and fitted with a sinusoidal function that is "guessed". Wavelet transforms are further used in order to quantify the periodicity variation in time of the kink oscillations. Results: The loop strands are found to oscillate with the periods ranging between 3 and 15 min. The oscillations are observed in intermittent regime with temporal changes in the period and amplitude. The oscillations are different at three analysed wavelengths. Conclusions: This finding suggests that the loop-like threads seen at different wavelengths are not co-spatial, hence that the loop consists of several multi-thermal strands. The detected irregularity of the oscillations can be associated with a stochastic driver acting at the footpoints of the loop. A movie associated to Fig. 1 is available in electronic form at http://www.aanda.org

  8. Large-scale automated image analysis for computational profiling of brain tissue surrounding implanted neuroprosthetic devices using Python.

    PubMed

    Rey-Villamizar, Nicolas; Somasundar, Vinay; Megjhani, Murad; Xu, Yan; Lu, Yanbin; Padmanabhan, Raghav; Trett, Kristen; Shain, William; Roysam, Badri

    2014-01-01

    In this article, we describe the use of Python for large-scale automated server-based bio-image analysis in FARSIGHT, a free and open-source toolkit of image analysis methods for quantitative studies of complex and dynamic tissue microenvironments imaged by modern optical microscopes, including confocal, multi-spectral, multi-photon, and time-lapse systems. The core FARSIGHT modules for image segmentation, feature extraction, tracking, and machine learning are written in C++, leveraging widely used libraries including ITK, VTK, Boost, and Qt. For solving complex image analysis tasks, these modules must be combined into scripts using Python. As a concrete example, we consider the problem of analyzing 3-D multi-spectral images of brain tissue surrounding implanted neuroprosthetic devices, acquired using high-throughput multi-spectral spinning disk step-and-repeat confocal microscopy. The resulting images typically contain 5 fluorescent channels. Each channel consists of 6000 × 10,000 × 500 voxels with 16 bits/voxel, implying image sizes exceeding 250 GB. These images must be mosaicked, pre-processed to overcome imaging artifacts, and segmented to enable cellular-scale feature extraction. The features are used to identify cell types, and perform large-scale analysis for identifying spatial distributions of specific cell types relative to the device. Python was used to build a server-based script (Dell 910 PowerEdge servers with 4 sockets/server with 10 cores each, 2 threads per core and 1TB of RAM running on Red Hat Enterprise Linux linked to a RAID 5 SAN) capable of routinely handling image datasets at this scale and performing all these processing steps in a collaborative multi-user multi-platform environment. Our Python script enables efficient data storage and movement between computers and storage servers, logs all the processing steps, and performs full multi-threaded execution of all codes, including open and closed-source third party libraries.

  9. Human interface design using Button-type PEDOT electrode array in EIT

    NASA Astrophysics Data System (ADS)

    Wi, Hun; In Oh, Tong; Yoon, Sun; Kim, Kap Jin; Woo, Eung Je

    2010-04-01

    Animal and human experiments using a multi-channel EIT system requires a cumbersome procedure to attach multiple electrodes. We have to ensure good contact of all electrodes and manage many lead wires during experiments. The problem becomes more severe as we increase the number of electrodes. These may limit the applicability of the imaging method in practice. Noting this technical difficulty, there have been a few trials to design human interface means such as electrode belts, helmets or rings. In this study, we developed an electrode belt for long-term monitoring of human lung ventilation. The belt includes 16 embossed electrodes which make good contact with the skin. The electrode is made by conductive polymer and metallic thread. Soft cushion and wide contact area minimize uncomfortable sensation and reduce contact impedances. The electrodes are attached to an elastic fabric belt at equal spacing. We describe details of its design and fabrication. Using the electrode belt and recently developed multi-frequency EIT system KHU Mark2, we show time-difference chest images of three human subjects during normal breathing cycles.

  10. “How Did We Get Here?”: Topic Drift in Online Health Discussions

    PubMed Central

    Hartzler, Andrea L; Huh, Jina; Hsieh, Gary; McDonald, David W; Pratt, Wanda

    2016-01-01

    Background Patients increasingly use online health communities to exchange health information and peer support. During the progression of health discussions, a change of topic—topic drift—can occur. Topic drift is a frequent phenomenon linked to incoherence and frustration in online communities and other forms of computer-mediated communication. For sensitive topics, such as health, such drift could have life-altering repercussions, yet topic drift has not been studied in these contexts. Objective Our goals were to understand topic drift in online health communities and then to develop and evaluate an automated approach to detect both topic drift and efforts of community members to counteract such drift. Methods We manually analyzed 721 posts from 184 threads from 7 online health communities within WebMD to understand topic drift, members’ reaction towards topic drift, and their efforts to counteract topic drift. Then, we developed an automated approach to detect topic drift and counteraction efforts. We detected topic drift by calculating cosine similarity between 229,156 posts from 37,805 threads and measuring change of cosine similarity scores from the threads’ first posts to their sequential posts. Using a similar approach, we detected counteractions to topic drift in threads by focusing on the irregular increase of similarity scores compared to the previous post in threads. Finally, we evaluated the performance of our automated approaches to detect topic drift and counteracting efforts by using a manually developed gold standard. Results Our qualitative analyses revealed that in threads of online health communities, topics change gradually, but usually stay within the global frame of topics for the specific community. Members showed frustration when topic drift occurred in the middle of threads but reacted positively to off-topic stories shared as separate threads. Although all types of members helped to counteract topic drift, original posters provided the most effort to keep threads on topic. Cosine similarity scores show promise for automatically detecting topical changes in online health discussions. In our manual evaluation, we achieved an F1 score of .71 and .73 for detecting topic drift and counteracting efforts to stay on topic, respectively. Conclusions Our analyses expand our understanding of topic drift in a health context and highlight practical implications, such as promoting off-topic discussions as a function of building rapport in online health communities. Furthermore, the quantitative findings suggest that an automated tool could help detect topic drift, support counteraction efforts to bring the conversation back on topic, and improve communication in these important communities. Findings from this study have the potential to reduce topic drift and improve online health community members’ experience of computer-mediated communication. Improved communication could enhance the personal health management of members who seek essential information and support during times of difficulty. PMID:27806924

  11. Toward Enhancing OpenMP's Work-Sharing Directives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, B M; Huang, L; Jin, H

    2006-05-17

    OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.

  12. Multigrid Equation Solvers for Large Scale Nonlinear Finite Element Simulations

    DTIC Science & Technology

    1999-01-01

    purpose of the second partitioning phase , on each SMP, is to minimize the communication within the SMP; even if a multi - threaded matrix vector product...8.7 Comparison of model with experimental data for send phase of matrix vector product on ne grid...140 8.4 Matrix vector product phase times : : : : : : : : : : : : : : : : : : : : : : : 145 9.1 Flat and

  13. Clinical leadership as an integral curriculum thread in pre-registration nursing programmes.

    PubMed

    Brown, Angela; Dewing, Jan; Crookes, Patrick

    2016-03-01

    In recent years there has been a growth in leadership development frameworks in health for the existing workforce. There has also been a related abundance of leadership programmes developed specifically for qualified nurses. There is a groundswell of opinion that clinical leadership preparation needs to extend to preparatory programmes leading to registration as a nurse. To this end a doctoral research study has been completed that focused specifically on the identification and verification of the antecedents of clinical leadership (leadership and management) so they can shape the curriculum content and the best way to deliver the curriculum content as a curriculum thread. To conceptualise how the curriculum content, identified and verified empirically, can be structured within a curriculum thread and to contribute to the discussion on effective pedagogical approaches and educational strategies for learning and teaching of clinical leadership. A multi-method design was utilised in the research in Australia. Drawing on core principles in critical social theory, an integral curriculum thread is proposed for pre-registration nursing programmes that identifies the antecedents of clinical leadership; the core concepts, together with the continuum of enlightenment, empowerment, and emancipation. The curriculum content, the effective pedagogical approaches and the educational strategies are supported theoretically and we believe this offers a design template for action and a way of thinking about this important aspect of preparatory nursing education. Moreover, we hope to have created a process contributing to a heighten sense of awareness in the nursing student (and other key stakeholders) of the what, how and when of clinical leadership for a novice registered nurse. The next stage is to further test through research the proposed integral curriculum thread. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Formation and Evolution of a Multi-Threaded Prominence

    NASA Technical Reports Server (NTRS)

    Luna, M.; Karpen, J. T.; DeVore, C. R.

    2012-01-01

    We investigate the process of formation and subsequent evolution of prominence plasma in a filament channel and its overlying arcade. We construct a three-dimensional time-dependent model of a filament-channel prominence suitable to be compared with observations. We combine this magnetic field structure with one-dimensional independent simulations of many flux tubes. The magnetic structure is a three-dimensional sheared double arcade, and the thermal non-equilibrium process governs the plasma evolution. We have found that the condensations in the corona can be divided into two populations: threads and blobs. Threads are massive condensations that linger in the field line dips. Blobs are ubiquitous small condensations that are produced throughout the filament and overlying arcade magnetic structure, and rapidly fall to the chromosphere. The total prominence mass is in agreement with observations. The threads are the principal contributors to the total mass, whereas the blob contribution is small. The motion of the threads is basically horizontal, while blobs move in all directions along the field. The peak velocities for both populations are comparable, but there is a weak tendency for the velocity to increase with the inclination, and the blobs with motion near vertical have the largest values of the velocity. We have generated synthetic images of the whole structure in an H proxy and in two EUV channels of the AIA instrument aboard SDO. These images show the plasma at cool, warm and hot temperatures. The theoretical differential emission measure of our system agrees very well with observations in the temperature range log T = 4.6-5.7. We conclude that the sheared-arcade magnetic structure and plasma dynamics fit well the abundant observational evidence.

  15. VoiceThread as a Peer Review and Dissemination Tool for Undergraduate Research

    NASA Astrophysics Data System (ADS)

    Guertin, L. A.

    2012-12-01

    VoiceThread has been utilized in an undergraduate research methods course for peer review and final research project dissemination. VoiceThread (http://www.voicethread.com) can be considered a social media tool, as it is a web-based technology with the capacity to enable interactive dialogue. VoiceThread is an application that allows a user to place a media collection online containing images, audio, videos, documents, and/or presentations in an interface that facilitates asynchronous communication. Participants in a VoiceThread can be passive viewers of the online content or engaged commenters via text, audio, video, with slide annotations via a doodle tool. The VoiceThread, which runs across browsers and operating systems, can be public or private for viewing and commenting and can be embedded into any website. Although few university students are aware of the VoiceThread platform (only 10% of the students surveyed by Ng (2012)), the 2009 K-12 edition of The Horizon Report (Johnson et al., 2009) lists VoiceThread as a tool to watch because of the opportunities it provides as a collaborative learning environment. In Fall 2011, eleven students enrolled in an undergraduate research methods course at Penn State Brandywine each conducted their own small-scale research project. Upon conclusion of the projects, students were required to create a poster summarizing their work for peer review. To facilitate the peer review process outside of class, each student-created PowerPoint file was placed in a VoiceThread with private access to only the class members and instructor. Each student was assigned to peer review five different student posters (i.e., VoiceThread images) with the audio and doodle tools to comment on formatting, clarity of content, etc. After the peer reviews were complete, the students were allowed to edit their PowerPoint poster files for a new VoiceThread. In the new VoiceThread, students were required to video record themselves describing their research and taking the viewer through their poster in the VoiceThread. This new VoiceThread with their final presentations was open for public viewing but not public commenting. A formal assessment was not conducted on the student impact of using VoiceThread for peer review and final research presentations. From an instructional standpoint, requiring students to use audio for the peer review commenting seemed to result in lengthier and more detailed reviews, connected with specific poster features when the doodle tool was utilized. By recording themselves as a "talking head" for the final product, students were required to be comfortable and confident with presenting their research, similar to what would be expected at a conference presentation. VoiceThread is currently being tested in general education Earth science courses at Penn State Brandywine as a dissemination tool for classroom-based inquiry projects and recruitment tool for Earth & Mineral Science majors.

  16. RCrawler: An R package for parallel web crawling and scraping

    NASA Astrophysics Data System (ADS)

    Khalil, Salim; Fakir, Mohamed

    RCrawler is a contributed R package for domain-based web crawling and content scraping. As the first implementation of a parallel web crawler in the R environment, RCrawler can crawl, parse, store pages, extract contents, and produce data that can be directly employed for web content mining applications. However, it is also flexible, and could be adapted to other applications. The main features of RCrawler are multi-threaded crawling, content extraction, and duplicate content detection. In addition, it includes functionalities such as URL and content-type filtering, depth level controlling, and a robot.txt parser. Our crawler has a highly optimized system, and can download a large number of pages per second while being robust against certain crashes and spider traps. In this paper, we describe the design and functionality of RCrawler, and report on our experience of implementing it in an R environment, including different optimizations that handle the limitations of R. Finally, we discuss our experimental results.

  17. Accelerating Demand Paging for Local and Remote Out-of-Core Visualization

    NASA Technical Reports Server (NTRS)

    Ellsworth, David

    2001-01-01

    This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.

  18. Robotic technology results in faster and more robust surgical skill acquisition than traditional laparoscopy.

    PubMed

    Moore, Lee J; Wilson, Mark R; Waine, Elizabeth; Masters, Rich S W; McGrath, John S; Vine, Samuel J

    2015-03-01

    Technical surgical skills are said to be acquired quicker on a robotic rather than laparoscopic platform. However, research examining this proposition is scarce. Thus, this study aimed to compare the performance and learning curves of novices acquiring skills using a robotic or laparoscopic system, and to examine if any learning advantages were maintained over time and transferred to more difficult and stressful tasks. Forty novice participants were randomly assigned to either a robotic- or laparoscopic-trained group. Following one baseline trial on a ball pick-and-drop task, participants performed 50 learning trials. Participants then completed an immediate retention trial and a transfer trial on a two-instrument rope-threading task. One month later, participants performed a delayed retention trial and a stressful multi-tasking trial. The results revealed that the robotic-trained group completed the ball pick-and-drop task more quickly and accurately than the laparoscopic-trained group across baseline, immediate retention, and delayed retention trials. Furthermore, the robotic-trained group displayed a shorter learning curve for accuracy. The robotic-trained group also performed the more complex rope-threading and stressful multi-tasking transfer trials better. Finally, in the multi-tasking trial, the robotic-trained group made fewer tone counting errors. The results highlight the benefits of using robotic technology for the acquisition of technical surgical skills.

  19. Factors affecting the pullout strength of cancellous bone screws.

    PubMed

    Chapman, J R; Harrington, R M; Lee, K M; Anderson, P A; Tencer, A F; Kowalski, D

    1996-08-01

    Screws placed into cancellous bone in orthopedic surgical applications, such as fixation of fractures of the femoral neck or the lumbar spine, can be subjected to high loads. Screw pullout is a possibility, especially if low density osteoporotic bone is encountered. The overall goal of this study was to determine how screw thread geometry, tapping, and cannulation affect the holding power of screws in cancellous bone and determine whether current designs achieve maximum purchase strength. Twelve types of commercially available cannulated and noncannulated cancellous bone screws were tested for pullout strength in rigid unicellular polyurethane foams of apparent densities and shear strengths within the range reported for human cancellous bone. The experimentally derived pullout strength was compared to a predicted shear failure force of the internal threads formed in the polyurethane foam. Screws embedded in porous materials pullout by shearing the internal threads in the porous material. Experimental pullout force was highly correlated to the predicted shear failure force (slope = 1.05, R2 = 0.947) demonstrating that it is controlled by the major diameter of the screw, the length of engagement of the thread, the shear strength of the material into which the screw is embedded, and a thread shape factor (TSF) which accounts for screw thread depth and pitch. The average TSF for cannulated screws was 17 percent lower than that of noncannulated cancellous screws, and the pullout force was correspondingly less. Increasing the TSF, a result of decreasing thread pitch or increasing thread depth, increases screw purchase strength in porous materials. Tapping was found to reduce pullout force by an average of 8 percent compared with nontapped holes (p = 0.0001). Tapping in porous materials decreases screw pullout strength because the removal of material by the tap enlarges hole volume by an average of 27 percent, in effect decreasing the depth and shear area of the internal threads in the porous material.

  20. Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

    2010-01-01

    An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less

  1. PolarHub: A Global Hub for Polar Data Discovery

    NASA Astrophysics Data System (ADS)

    Li, W.

    2014-12-01

    This paper reports the outcome of a NSF project in developing a large-scale web crawler PolarHub to discover automatically the distributed polar dataset in the format of OGC web services (OWS) in the cyberspace. PolarHub is a machine robot; its goal is to visit as many webpages as possible to find those containing information about polar OWS, extract this information and store it into the backend data repository. This is a very challenging task given huge data volume of webpages on the Web. Three unique features was introduced in PolarHub to make it distinctive from earlier crawler solutions: (1) a multi-task, multi-user, multi-thread support to the crawling tasks; (2) an extensive use of thread pool and Data Access Object (DAO) design patterns to separate persistent data storage and business logic to achieve high extendibility of the crawler tool; (3) a pattern-matching based customizable crawling algorithm to support discovery of multi-type geospatial web services; and (4) a universal and portable client-server communication mechanism combining a server-push and client pull strategies for enhanced asynchronous processing. A series of experiments were conducted to identify the impact of crawling parameters to the overall system performance. The geographical distribution pattern of all PolarHub identified services is also demonstrated. We expect this work to make a major contribution to the field of geospatial information retrieval and geospatial interoperability, to bridge the gap between data provider and data consumer, and to accelerate polar science by enhancing the accessibility and reusability of adequate polar data.

  2. AIBench: a rapid application development framework for translational research in biomedicine.

    PubMed

    Glez-Peña, D; Reboiro-Jato, M; Maia, P; Rocha, M; Díaz, F; Fdez-Riverola, F

    2010-05-01

    Applied research in both biomedical discovery and translational medicine today often requires the rapid development of fully featured applications containing both advanced and specific functionalities, for real use in practice. In this context, new tools are demanded that allow for efficient generation, deployment and reutilization of such biomedical applications as well as their associated functionalities. In this context this paper presents AIBench, an open-source Java desktop application framework for scientific software development with the goal of providing support to both fundamental and applied research in the domain of translational biomedicine. AIBench incorporates a powerful plug-in engine, a flexible scripting platform and takes advantage of Java annotations, reflection and various design principles in order to make it easy to use, lightweight and non-intrusive. By following a basic input-processing-output life cycle, it is possible to fully develop multiplatform applications using only three types of concepts: operations, data-types and views. The framework automatically provides functionalities that are present in a typical scientific application including user parameter definition, logging facilities, multi-threading execution, experiment repeatability and user interface workflow management, among others. The proposed framework architecture defines a reusable component model which also allows assembling new applications by the reuse of libraries from past projects or third-party software. Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.

  3. Quick application/release nut with engagement indicator (commercial application of an innovative nut design)

    NASA Technical Reports Server (NTRS)

    Wright, Jay M.

    1991-01-01

    This is an assembly which permits a fastener to be inserted or removed from either side with an indicator of fastener engagement. The nut has a plurality of segments, preferably at least three segments, which are internally threaded, spring loaded apart by an internal spring, and has detents on opposite sides which force the nut segments into operative engagement with a threaded member when pushed in and release the segments for quick insertion or removal of the fastener when moved out. When the nut is installed, end pressure on the detents presses the nut segments into operative engagement with a threaded member where continued rotation locks the structure together with the detents depressed to indicate positive locking engagement of the nut. On removal, counterclockwise rotation relieves the endwise pressure on the detents, permitting internal springs to force the detents outward, allowing the nut segments to move outward and separate to permit quick removal of the fastener.

  4. Magnetic field and radiative transfer modelling of a quiescent prominence

    NASA Astrophysics Data System (ADS)

    Gunár, S.; Schwartz, P.; Dudík, J.; Schmieder, B.; Heinzel, P.; Jurčák, J.

    2014-07-01

    Aims: The aim of this work is to analyse the multi-instrument observations of the June 22, 2010 prominence to study its structure in detail, including the prominence-corona transition region and the dark bubble located below the prominence body. Methods: We combined results of the 3D magnetic field modelling with 2D prominence fine structure radiative transfer models to fully exploit the available observations. Results: The 3D linear force-free field model with the unsheared bipole reproduces the morphology of the analysed prominence reasonably well, thus providing useful information about its magnetic field configuration and the location of the magnetic dips. The 2D models of the prominence fine structures provide a good representation of the local plasma configuration in the region dominated by the quasi-vertical threads. However, the low observed Lyman-α central intensities and the morphology of the analysed prominence suggest that its upper central part is not directly illuminated from the solar surface. Conclusions: This multi-disciplinary prominence study allows us to argue that a large part of the prominence-corona transition region plasma can be located inside the magnetic dips in small-scale features that surround the cool prominence material located in the dip centre. We also argue that the dark prominence bubbles can be formed because of perturbations of the prominence magnetic field by parasitic bipoles, causing them to be devoid of the magnetic dips. Magnetic dips, however, form thin layers that surround these bubbles, which might explain the occurrence of the cool prominence material in the lines of sight intersecting the prominence bubbles. Movie and Appendix A are available in electronic form at http://www.aanda.org

  5. Metal and transuranic records in mussel shells, byssal threads and tissues

    NASA Astrophysics Data System (ADS)

    Koide, Minoru; Lee, Dong Soo; Goldberg, Edward D.

    1982-12-01

    Bivalve shells offer several advantages over tissues for the monitoring of heavy metal pollutants in the marine environment. They are easier to handle and to store. The problem of whether to depurate the animals before analyses is avoided. The shells appear to be more sensitive to environmental heavy metals levels over the long term than do the soft parts. Of the substances examined (Cd, Cu, Zn, Pb, Ag, Ni, 238Pu and 239 + 240Pu) only Pb and Pu displayed a strong covariance between soft tissue and shell concentrations. There were strong correlations between metals in the shell but not in the soft tissues in general. The byssal threads, because of their enrichment of transuranic elements and of their ease in handling, may be useful in monitoring these metals. A very weak discharge of 238Pu to marine waters adjacent to a nuclear reactor was detected in the byssal threads of mussels.

  6. Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

    PubMed Central

    Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

    2014-01-01

    While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

  7. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    PubMed

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  8. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  9. DistributedFBA.jl: High-level, high-performance flux balance analysis in Julia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heirendt, Laurent; Thiele, Ines; Fleming, Ronan M. T.

    Flux balance analysis and its variants are widely used methods for predicting steady-state reaction rates in biochemical reaction networks. The exploration of high dimensional networks with such methods is currently hampered by software performance limitations. DistributedFBA.jl is a high-level, high-performance, open-source implementation of flux balance analysis in Julia. It is tailored to solve multiple flux balance analyses on a subset or all the reactions of large and huge-scale networks, on any number of threads or nodes. DistributedFBA.jl is a high-level, high-performance, open-source implementation of flux balance analysis in Julia. It is tailored to solve multiple flux balance analyses on amore » subset or all the reactions of large and huge-scale networks, on any number of threads or nodes.« less

  10. DistributedFBA.jl: High-level, high-performance flux balance analysis in Julia

    DOE PAGES

    Heirendt, Laurent; Thiele, Ines; Fleming, Ronan M. T.

    2017-01-16

    Flux balance analysis and its variants are widely used methods for predicting steady-state reaction rates in biochemical reaction networks. The exploration of high dimensional networks with such methods is currently hampered by software performance limitations. DistributedFBA.jl is a high-level, high-performance, open-source implementation of flux balance analysis in Julia. It is tailored to solve multiple flux balance analyses on a subset or all the reactions of large and huge-scale networks, on any number of threads or nodes. DistributedFBA.jl is a high-level, high-performance, open-source implementation of flux balance analysis in Julia. It is tailored to solve multiple flux balance analyses on amore » subset or all the reactions of large and huge-scale networks, on any number of threads or nodes.« less

  11. High Resolution Modelling of the Congo River's Multi-Threaded Main Stem Hydraulics

    NASA Astrophysics Data System (ADS)

    Carr, A. B.; Trigg, M.; Tshimanga, R.; Neal, J. C.; Borman, D.; Smith, M. W.; Bola, G.; Kabuya, P.; Mushie, C. A.; Tschumbu, C. L.

    2017-12-01

    We present the results of a summer 2017 field campaign by members of the Congo River users Hydraulics and Morphology (CRuHM) project, and a subsequent reach-scale hydraulic modelling study on the Congo's main stem. Sonar bathymetry, ADCP transects, and water surface elevation data have been collected along the Congo's heavily multi-threaded middle reach, which exhibits complex in-channel hydraulic processes that are not well understood. To model the entire basin's hydrodynamics, these in-channel hydraulic processes must be parameterised since it is not computationally feasible to represent them explicitly. Furthermore, recent research suggests that relative to other large global rivers, in-channel flows on the Congo represent a relatively large proportion of total flow through the river-floodplain system. We therefore regard sufficient representation of in-channel hydraulic processes as a Congo River hydrodynamic research priority. To enable explicit representation of in-channel hydraulics, we develop a reach-scale (70 km), high resolution hydraulic model. Simulation of flow through individual channel threads provides new information on flow depths and velocities, and will be used to inform the parameterisation of a broader basin-scale hydrodynamic model. The basin-scale model will ultimately be used to investigate floodplain fluxes, flood wave attenuation, and the impact of future hydrological change scenarios on basin hydrodynamics. This presentation will focus on the methodology we use to develop a reach-scale bathymetric DEM. The bathymetry of only a small proportion of channel threads can realistically be captured, necessitating some estimation of the bathymetry of channels not surveyed. We explore different approaches to this bathymetry estimation, and the extent to which it influences hydraulic model predictions. The CRuHM project is a consortium comprising the Universities of Kinshasa, Rhodes, Dar es Salaam, Bristol, and Leeds, and is funded by Royal Society-DFID Africa Capacity Building Initiative. The project aims to strengthen institutional research capacity and advance our understanding of the hydrology, hydrodynamics and sediment dynamics of the world's second largest river system through fieldwork and development of numerical models.

  12. The effects of wildfire on native tree species in the Middle Rio Grande bosques of New Mexico

    Treesearch

    Brad Johnson; David Merritt

    2009-01-01

    The cottonwood bosques along the Middle Fork of the Rio Grande (MRG) form a ribbon of surviving habitat in this once vast ecosystem. Historically, the channel had a multi-threaded and braided configuration that created a rich mosaic of habitats, including mixed-aged cottonwood forests, meadows, and willow-dominated riparian wetlands and backwaters (...

  13. Dynamic analyses, FPGA implementation and engineering applications of multi-butterfly chaotic attractors generated from generalised Sprott C system

    NASA Astrophysics Data System (ADS)

    Lai, Qiang; Zhao, Xiao-Wen; Rajagopal, Karthikeyan; Xu, Guanghui; Akgul, Akif; Guleryuz, Emre

    2018-01-01

    This paper considers the generation of multi-butterfly chaotic attractors from a generalised Sprott C system with multiple non-hyperbolic equilibria. The system is constructed by introducing an additional variable whose derivative has a switching function to the Sprott C system. It is numerically found that the system creates two-, three-, four-, five-butterfly attractors and any other multi-butterfly attractors. First, the dynamic analyses of multi-butterfly chaotic attractors are presented. Secondly, the field programmable gate array implementation, electronic circuit realisation and random number generator are done with the multi-butterfly chaotic attractors.

  14. Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

    PubMed

    Komarov, Ivan; D'Souza, Roshan M

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

  15. The Reconstruction Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the Insight Toolkit (ITK)

    NASA Astrophysics Data System (ADS)

    Rit, S.; Vila Oliva, M.; Brousmiche, S.; Labarbe, R.; Sarrut, D.; Sharp, G. C.

    2014-03-01

    We propose the Reconstruction Toolkit (RTK, http://www.openrtk.org), an open-source toolkit for fast cone-beam CT reconstruction, based on the Insight Toolkit (ITK) and using GPU code extracted from Plastimatch. RTK is developed by an open consortium (see affiliations) under the non-contaminating Apache 2.0 license. The quality of the platform is daily checked with regression tests in partnership with Kitware, the company supporting ITK. Several features are already available: Elekta, Varian and IBA inputs, multi-threaded Feldkamp-David-Kress reconstruction on CPU and GPU, Parker short scan weighting, multi-threaded CPU and GPU forward projectors, etc. Each feature is either accessible through command line tools or C++ classes that can be included in independent software. A MIDAS community has been opened to share CatPhan datasets of several vendors (Elekta, Varian and IBA). RTK will be used in the upcoming cone-beam CT scanner developed by IBA for proton therapy rooms. Many features are under development: new input format support, iterative reconstruction, hybrid Monte Carlo / deterministic CBCT simulation, etc. RTK has been built to freely share tomographic reconstruction developments between researchers and is open for new contributions.

  16. RACER: Effective Race Detection Using AspectJ

    NASA Technical Reports Server (NTRS)

    Bodden, Eric; Havelund, Klaus

    2008-01-01

    The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.

  17. Towards a new generation of fibre optic chemical sensors based on spider silk threads

    NASA Astrophysics Data System (ADS)

    Hey Tow, Kenny; Chow, Desmond M.; Vollrath, Fritz; Dicaire, Isabelle; Gheysens, Tom; Thévenaz, Luc

    2017-04-01

    A spider uses up to seven different types of silk, all having specific functions, to build its web. For scientists, native silk - directly extracted from spiders - is a tough, biodegradable and biocompatible thread used mainly for tissue engineering and textile applications. Blessed with outstanding optical properties, this protein strand can also be used as an optical fibre and is, moreover, intrinsically sensitive to chemical compounds. In this communication, a pioneering proof-of-concept experiment using spider silk, in its pristine condition, as a new type of fibre-optic relative humidity sensor will be demonstrated and its potential for future applications discussed.

  18. Automatic Thread-Level Parallelization in the Chombo AMR Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Christen, Matthias; Keen, Noel; Ligocki, Terry

    2011-05-26

    The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less

  19. A triaxial supramolecular weave

    NASA Astrophysics Data System (ADS)

    Lewandowska, Urszula; Zajaczkowski, Wojciech; Corra, Stefano; Tanabe, Junki; Borrmann, Ruediger; Benetti, Edmondo M.; Stappert, Sebastian; Watanabe, Kohei; Ochs, Nellie A. K.; Schaeublin, Robin; Li, Chen; Yashima, Eiji; Pisula, Wojciech; Müllen, Klaus; Wennemers, Helma

    2017-11-01

    Despite recent advances in the synthesis of increasingly complex topologies at the molecular level, nano- and microscopic weaves have remained difficult to achieve. Only a few diaxial molecular weaves exist—these were achieved by templation with metals. Here, we present an extended triaxial supramolecular weave that consists of self-assembled organic threads. Each thread is formed by the self-assembly of a building block comprising a rigid oligoproline segment with two perylene-monoimide chromophores spaced at 18 Å. Upon π stacking of the chromophores, threads form that feature alternating up- and down-facing voids at regular distances. These voids accommodate incoming building blocks and establish crossing points through CH-π interactions on further assembly of the threads into a triaxial woven superstructure. The resulting micrometre-scale supramolecular weave proved to be more robust than non-woven self-assemblies of the same building block. The uniform hexagonal pores of the interwoven network were able to host iridium nanoparticles, which may be of interest for practical applications.

  20. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    NASA Astrophysics Data System (ADS)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  1. Climate Modeling with a Million CPUs

    NASA Astrophysics Data System (ADS)

    Tobis, M.; Jackson, C. S.

    2010-12-01

    Michael Tobis, Ph.D. Research Scientist Associate University of Texas Institute for Geophysics Charles S. Jackson Research Scientist University of Texas Institute for Geophysics Meteorological, oceanographic, and climatological applications have been at the forefront of scientific computing since its inception. The trend toward ever larger and more capable computing installations is unabated. However, much of the increase in capacity is accompanied by an increase in parallelism and a concomitant increase in complexity. An increase of at least four additional orders of magnitude in the computational power of scientific platforms is anticipated. It is unclear how individual climate simulations can continue to make effective use of the largest platforms. Conversion of existing community codes to higher resolution, or to more complex phenomenology, or both, presents daunting design and validation challenges. Our alternative approach is to use the expected resources to run very large ensembles of simulations of modest size, rather than to await the emergence of very large simulations. We are already doing this in exploring the parameter space of existing models using the Multiple Very Fast Simulated Annealing algorithm, which was developed for seismic imaging. Our experiments have the dual intentions of tuning the model and identifying ranges of parameter uncertainty. Our approach is less strongly constrained by the dimensionality of the parameter space than are competing methods. Nevertheless, scaling up remains costly. Much could be achieved by increasing the dimensionality of the search and adding complexity to the search algorithms. Such ensemble approaches scale naturally to very large platforms. Extensions of the approach are anticipated. For example, structurally different models can be tuned to comparable effectiveness. This can provide an objective test for which there is no realistic precedent with smaller computations. We find ourselves inventing new code to manage our ensembles. Component computations involve tens to hundreds of CPUs and tens to hundreds of hours. The results of these moderately large parallel jobs influence the scheduling of subsequent jobs, and complex algorithms may be easily contemplated for this. The operating system concept of a "thread" re-emerges at a very coarse level, where each thread manages atomic computations of thousands of CPU-hours. That is, rather than multiple threads operating on a processor, at this level, multiple processors operate within a single thread. In collaboration with the Texas Advanced Computing Center, we are developing a software library at the system level, which should facilitate the development of computations involving complex strategies which invoke large numbers of moderately large multi-processor jobs. While this may have applications in other sciences, our key intent is to better characterize the coupled behavior of a very large set of climate model configurations.

  2. Experimental analysis of insertion torques and forces of threaded and press-fit acetabular cups by means of ex vivo and in vivo measurements.

    PubMed

    Vogel, Danny; Rathay, Andreas; Teufel, Stephanie; Ellenrieder, Martin; Zietz, Carmen; Sander, Manuela; Bader, Rainer

    2017-01-01

    In THA a sufficient primary implant stability is the precondition for successful secondary stability. Industrial foams of different densities have been used for primary stability investigations. The aim of this study was to analyse and compare the insertion behaviour of threaded and press-fit cups in vivo and ex vivo using bone substitutes with various densities. Two threaded (Bicon Plus®, Trident® TC) and one press-fit cup (Trident PSL®) were inserted by orthopaedic surgeons (S1, S2) into 10, 20 and 31 pcf blocks, using modified surgical instruments allowing measurements of the insertion forces and torques. Furthermore, the insertion behaviour of two cups were analysed intraoperatively. Torques for the threaded cups increased while bone substitute density increased. Maximum insertion torques were observed for S2 with 102 Nm for the Bicon Plus® in 20 pcf blocks and 77 Nm for the Trident® TC in 31 pcf blocks, which compares to the in vivo measurement (85 Nm). The average insertion forces for the press-fit cup varied from 5.2 to 6.8 kN (S1) and 7.2-11.5 kN (S2) ex vivo. Intraoperatively an average insertion force of 8.0 kN was determined. Implantation behaviour was influenced by acetabular cup design, bone substitute and experience of the surgeon. No specific density of bone substitute could be favoured for ex vivo investigations on the implantation behaviour of acetabular cups. The use synthetic bone blocks of high density (31 pcf) led to problems regarding cup orientation and seating. Therefore, bone substitutes used should be critically scrutinized in terms of the comparability to the in vivo situation.

  3. Integrating end-to-end threads of control into object-oriented analysis and design

    NASA Technical Reports Server (NTRS)

    Mccandlish, Janet E.; Macdonald, James R.; Graves, Sara J.

    1993-01-01

    Current object-oriented analysis and design methodologies fall short in their use of mechanisms for identifying threads of control for the system being developed. The scenarios which typically describe a system are more global than looking at the individual objects and representing their behavior. Unlike conventional methodologies that use data flow and process-dependency diagrams, object-oriented methodologies do not provide a model for representing these global threads end-to-end. Tracing through threads of control is key to ensuring that a system is complete and timing constraints are addressed. The existence of multiple threads of control in a system necessitates a partitioning of the system into processes. This paper describes the application and representation of end-to-end threads of control to the object-oriented analysis and design process using object-oriented constructs. The issue of representation is viewed as a grouping problem, that is, how to group classes/objects at a higher level of abstraction so that the system may be viewed as a whole with both classes/objects and their associated dynamic behavior. Existing object-oriented development methodology techniques are extended by adding design-level constructs termed logical composite classes and process composite classes. Logical composite classes are design-level classes which group classes/objects both logically and by thread of control information. Process composite classes further refine the logical composite class groupings by using process partitioning criteria to produce optimum concurrent execution results. The goal of these design-level constructs is to ultimately provide the basis for a mechanism that can support the creation of process composite classes in an automated way. Using an automated mechanism makes it easier to partition a system into concurrently executing elements that can be run in parallel on multiple processors.

  4. metAlignID: a high-throughput software tool set for automated detection of trace level contaminants in comprehensive LECO two-dimensional gas chromatography time-of-flight mass spectrometry data.

    PubMed

    Lommen, Arjen; van der Kamp, Henk J; Kools, Harrie J; van der Lee, Martijn K; van der Weg, Guido; Mol, Hans G J

    2012-11-09

    A new alternative data processing tool set, metAlignID, is developed for automated pre-processing and library-based identification and concentration estimation of target compounds after analysis by comprehensive two-dimensional gas chromatography with mass spectrometric detection. The tool set has been developed for and tested on LECO data. The software is developed to run multi-threaded (one thread per processor core) on a standard PC (personal computer) under different operating systems and is as such capable of processing multiple data sets simultaneously. Raw data files are converted into netCDF (network Common Data Form) format using a fast conversion tool. They are then preprocessed using previously developed algorithms originating from metAlign software. Next, the resulting reduced data files are searched against a user-composed library (derived from user or commercial NIST-compatible libraries) (NIST=National Institute of Standards and Technology) and the identified compounds, including an indicative concentration, are reported in Excel format. Data can be processed batch wise. The overall time needed for conversion together with processing and searching of 30 raw data sets for 560 compounds is routinely within an hour. The screening performance is evaluated for detection of pesticides and contaminants in raw data obtained after analysis of soil and plant samples. Results are compared to the existing data-handling routine based on proprietary software (LECO, ChromaTOF). The developed software tool set, which is freely downloadable at www.metalign.nl, greatly accelerates data-analysis and offers more options for fine-tuning automated identification toward specific application needs. The quality of the results obtained is slightly better than the standard processing and also adds a quantitative estimate. The software tool set in combination with two-dimensional gas chromatography coupled to time-of-flight mass spectrometry shows great potential as a highly-automated and fast multi-residue instrumental screening method. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

    PubMed

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2017-01-01

    Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.

  6. Distributed Emulation in Support of Large Networks

    DTIC Science & Technology

    2016-06-01

    Provider LTE Long Term Evolution MB Megabyte MIPS Microprocessor without Interlocked Pipeline Stages MRT Multi-Threaded Routing Toolkit NPS Naval...environment, modifications to a network, protocol, or model can be executed – and the effects measured – without affecting real-world users or services...produce their results when analyzing performance of Long Term Evolution ( LTE ) gateways [3]. Many research scenarios allow problems to be represented

  7. Constant time worker thread allocation via configuration caching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eichenberger, Alexandre E; O'Brien, John K. P.

    Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.

  8. High Performance Descriptive Semantic Analysis of Semantic Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less

  9. Heterogeneous scalable framework for multiphase flows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morris, Karla Vanessa

    2013-09-01

    Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less

  10. Processing communications events in parallel active messaging interface by awakening thread from wait state

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  11. Metabolic studies of mammalian cells by 31P-NMR using a continuous perfusion technique.

    PubMed

    Knop, R H; Chen, C W; Mitchell, J B; Russo, A; McPherson, S; Cohen, J S

    1984-07-20

    Levels of ATP and Pi in metabolically active Chinese hamster lung fibroblasts were monitored noninvasively by 31P-NMR over many hours and under a variety of conditions. The cells were embedded in a matrix of agarose gel in the form of fine threads which were continuously perfused in a standard NMR tube. The small diameter of the thread allows rapid diffusion of metabolites and drugs into the cells. The changes in ATP and Pi levels were followed as a function of time in response to perfusion with a glucose-containing medium, with isotonic saline and with a medium containing 2,4-dinitrophenol, an uncoupler of oxidative phosphorylation. This gel-thread perfusion method should enable routine NMR studies of cellular metabolism, and may have other potential biological applications.

  12. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  13. Screw-Thread Standards for Federal Services, 1957. Handbook H28 (1957), Part 3

    DTIC Science & Technology

    1957-09-01

    MOUNTING THREADS PHOTOGRAPHIC EQUIPMENT THREADS ISO METRIC THREADS; MISCELLANEOUS THREADS CLASS 5 INTERFERENCE-FIT THREADS, TRIAL STANDARD WRENCH...Bibliography on measurement of pitch diameter by means of wires 60 Appendix 14. Metric screw-thread standards 61 1. ISO thread profiles...61 2. Standard series for ISO metric threads 62 3. Designations for ISO metric threads 62 Tables Page Table XII. 1.—Basic

  14. Atomistic modeling and HAADF investigations of misfit and threading dislocations in GaSb/GaAs hetero-structures for applications in high electron mobility transistors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruterana, Pierre, E-mail: pierre.ruterana@ensicaen.fr; Wang, Yi, E-mail: pierre.ruterana@ensicaen.fr; Chen, Jun, E-mail: pierre.ruterana@ensicaen.fr

    A detailed investigation on the misfit and threading dislocations at GaSb/GaAs interface has been carried out using molecular dynamics simulation and quantitative electron microscopy techniques. The sources and propagation of misfit dislocations have been elucidated. The nature and formation mechanisms of the misfit dislocations as well as the role of Sb on the stability of the Lomer configuration have been explained.

  15. Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less

  16. Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

    Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less

  17. Multithreaded hybrid feature tracking for markerless augmented reality.

    PubMed

    Lee, Taehee; Höllerer, Tobias

    2009-01-01

    We describe a novel markerless camera tracking approach and user interaction methodology for augmented reality (AR) on unprepared tabletop environments. We propose a real-time system architecture that combines two types of feature tracking. Distinctive image features of the scene are detected and tracked frame-to-frame by computing optical flow. In order to achieve real-time performance, multiple operations are processed in a synchronized multi-threaded manner: capturing a video frame, tracking features using optical flow, detecting distinctive invariant features, and rendering an output frame. We also introduce user interaction methodology for establishing a global coordinate system and for placing virtual objects in the AR environment by tracking a user's outstretched hand and estimating a camera pose relative to it. We evaluate the speed and accuracy of our hybrid feature tracking approach, and demonstrate a proof-of-concept application for enabling AR in unprepared tabletop environments, using bare hands for interaction.

  18. XMOS XC-2 Development Board for Mechanical Control and Data Collection

    NASA Technical Reports Server (NTRS)

    Jarnot, Robert F.; Bowden, William J.

    2011-01-01

    The scanning microwave limb sounder (SMLS) will use technological improvements in low-noise mixers to provide precise data on the Earth s atmospheric composition with high spatial resolution. This project focuses on the design and implementation of a realtime control system needed for airborne engineering tests of the SMLS. The system must coordinate the actuation of optical components using four motors with encoder readback, while collecting synchronized telemetric data from a GPS receiver and 3-axis gyrometric system. A graphical user interface for testing the control system was also designed using Python. Although the system could have been implemented with an FPGA(fieldprogrammable gate array)-based setup, a processor development kit manufactured by XMOS was chosen. The XMOS architecture allows parallel execution of multiple tasks on separate threads, making it ideal for this application. It is easily programmed using XC (a subset of C). The necessary communication interfaces were implemented in software, including Ethernet, with significant cost and time reduction compared to an FPGA-based approach. A simple approach to control the chopper, calibration mirror, and gimbal for the airborne SMLS was needed. The XMOS board allows for multiple threads and real-time data acquisition. The XC-2 development kit is an attractive choice for synchronized, real-time, event-driven applications. The XMOS is based on the transputer microprocessor architecture developed for parallel computing, which is being revamped in this new platform. The XMOS device has multiple cores capable of running parallel applications on separate threads. The threads communicate with each other via user-defined channels capable of transmitting data within the device. XMOS provides a C-based development environment using XC, which eliminates the need for custom tool kits associated with FPGA programming. The XC-2 has four cores and necessary hardware for Ethernet I/O.

  19. Constructing Neuronal Network Models in Massively Parallel Environments.

    PubMed

    Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus

    2017-01-01

    Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.

  20. Constructing Neuronal Network Models in Massively Parallel Environments

    PubMed Central

    Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus

    2017-01-01

    Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808

  1. Dedicated memory structure holding data for detecting available worker thread(s) and informing available worker thread(s) of task(s) to execute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiu, George L.; Eichenberger, Alexandre E.; O'Brien, John K. P.

    The present disclosure relates generally to a dedicated memory structure (that is, hardware device) holding data for detecting available worker thread(s) and informing available worker thread(s) of task(s) to execute.

  2. [Time and use of discussion forums in type 1 diabetes: contribution to patient education].

    PubMed

    Harry, Isabelle; Gagnayre, Rémi

    2013-01-01

    The purpose of this study was to elucidate the concept of temporality in discussions on forums used by individuals concerned by type 1 diabetes: adults and parents of children. The contents of messages were first converted into skills, and their temporality was then analysed, particularly in terms of the duration of active threads. Two types of temporality are involved in the use of forums: prescribed time governed by the therapeutic requirements related to a chronic disease and the decisions to be taken, and open-ended social time available on the Internet and the resulting reflexive processes. Our results show that topics relating to self-care and adaptation skills are often discussed and new threads on the topic are frequently introduced. Considerable diversity in the activity level associated with the various threads was observed, as most threads were only active for short periods. Following this study, our research perspectives concern: (i) the ways in which patients and their families reconcile the temporality dictated by a chronic disease (prescribed time) with the open-ended social time available on the Internet; and (ii) the ways in which this temporality is characteristic of patient learning processes via discussion forums. Future research will focus on the concept of rythmo-apprenance (rhythmic learning) in therapeutic patient education.

  3. MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakayasu, Ernesto S.; Nicora, Carrie D.; Sims, Amy C.

    2016-05-03

    ABSTRACT Integrative multi-omics analyses can empower more effective investigation and complete understanding of complex biological systems. Despite recent advances in a range of omics analyses, multi-omic measurements of the same sample are still challenging and current methods have not been well evaluated in terms of reproducibility and broad applicability. Here we adapted a solvent-based method, widely applied for extracting lipids and metabolites, to add proteomics to mass spectrometry-based multi-omics measurements. Themetabolite,protein, andlipidextraction (MPLEx) protocol proved to be robust and applicable to a diverse set of sample types, including cell cultures, microbial communities, and tissues. To illustrate the utility of thismore » protocol, an integrative multi-omics analysis was performed using a lung epithelial cell line infected with Middle East respiratory syndrome coronavirus, which showed the impact of this virus on the host glycolytic pathway and also suggested a role for lipids during infection. The MPLEx method is a simple, fast, and robust protocol that can be applied for integrative multi-omic measurements from diverse sample types (e.g., environmental,in vitro, and clinical). IMPORTANCEIn systems biology studies, the integration of multiple omics measurements (i.e., genomics, transcriptomics, proteomics, metabolomics, and lipidomics) has been shown to provide a more complete and informative view of biological pathways. Thus, the prospect of extracting different types of molecules (e.g., DNAs, RNAs, proteins, and metabolites) and performing multiple omics measurements on single samples is very attractive, but such studies are challenging due to the fact that the extraction conditions differ according to the molecule type. Here, we adapted an organic solvent-based extraction method that demonstrated broad applicability and robustness, which enabled comprehensive proteomics, metabolomics, and lipidomics analyses from the same sample.« less

  4. Simulation of LHC events on a millions threads

    NASA Astrophysics Data System (ADS)

    Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

    2015-12-01

    Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest supercomputer in the world, can run roughly five times the number of parallel processes that the ATLAS experiment typically uses on the Grid. We ported Alpgen, a serial x86 code, to run as a parallel application under MPI on the Blue Gene/Q architecture. By analysis of the Alpgen code, we reduced the memory footprint to allow running 64 threads per node, utilizing the four hardware threads available per core on the PowerPC A2 processor. Event generation and unweighting, typically run as independent serial phases, are coupled together in a single job in this scenario, reducing intermediate writes to the filesystem. By these optimizations, we have successfully run LHC proton-proton physics event generation at the scale of a million threads, filling two-thirds of Mira.

  5. Optimisation of multi-layer rotationally moulded foamed structures

    NASA Astrophysics Data System (ADS)

    Pritchard, A. J.; McCourt, M. P.; Kearns, M. P.; Martin, P. J.; Cunningham, E.

    2018-05-01

    Multi-layer skin-foam and skin-foam-skin sandwich constructions are of increasing interest in the rotational moulding process for two reasons. Firstly, multi-layer constructions can improve the thermal insulation properties of a part. Secondly, foamed polyethylene sandwiched between solid polyethylene skins can increase the mechanical properties of rotationally moulded structural components, in particular increasing flexural properties and impact strength (IS). The processing of multiple layers of polyethylene and polyethylene foam presents unique challenges such as the control of chemical blowing agent decomposition temperature, and the optimisation of cooling rates to prevent destruction of the foam core; therefore, precise temperature control is paramount to success. Long cooling cycle times are associated with the creation of multi-layer foam parts due to their insulative nature; consequently, often making the costs of production prohibitive. Devices such as Rotocooler®, a rapid internal mould water spray cooling system, have been shown to have the potential to significantly decrease cooling times in rotational moulding. It is essential to monitor and control such devices to minimise the warpage associated with the rapid cooling of a moulding from only one side. The work presented here demonstrates the use of threaded thermocouples to monitor the polymer melt in multi-layer sandwich constructions, in order to analyse the cooling cycle of multi-layer foamed structures. A series of polyethylene skin-foam test mouldings were produced, and the effect of cooling medium on foam characteristics, mechanical properties, and process cycle time were investigated. Cooling cycle time reductions of 45%, 26%, and 29% were found for increasing (1%, 2%, and 3%) chemical blowing agent (CBA) amount when using internal water cooling technology from ˜123°C compared with forced air cooling (FAC). Subsequently, a reduction of IS for the same skin-foam parts was found to be 1%, 4%, and 16% compared with FAC.

  6. LEAKAGE CHARACTERISTICS OF MULTI-CONDUCTOR CABLES AND CONDUIT SEALS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nelson, C.; Becker, S.

    1962-12-12

    Pipe threads in conduit seal-offs can be made air tight by use of a two- part thiokol-epoxy sealant such as Sika.'' This material bonds to metal but does not harden; thus, threaded parts can be separated. Gas seals in conduit sealoffs can be made by use of Chico, Type A'' sealant. This material is hard and can withstand high pressure differentials. However, there is a detectable leakage through Chico, Type A.'' Sika'' can be used to make a suitable gas- tight seal. However, this material is flexible and will not support long cable lengths. A dual pour method is suggestedmore » of first casting Chico'' around the connectors to obtain strength in the seal and then using either Sika'' or Micro-Preg'' to produce a tight seal. Leakage through the cable, between strands of conductor, can be reduced by either soldering the ends or dipping the ends in conductive epoxy paint. (auth)« less

  7. ROOT 6 and beyond: TObject, C++14 and many cores

    DOE PAGES

    Bellenot, B.; Canal, Ph; Couet, O.; ...

    2015-12-23

    Following the release of version 6, ROOT has entered a new area of development. It will leverage the industrial strength compiler library shipping in ROOT 6 and its support of the C++11/14 standard, to significantly simplify and harden ROOT's interfaces and to clarify and substantially improve ROOT's support for multi-threaded environments. Furthermore, this talk will also recap the most important new features and enhancements in ROOT in general, focusing on those allowed by the improved interpreter and better compiler support, including I/O for smart pointers, easier type safe access to the content of TTrees and enhanced multi processor support.

  8. Systematic and Scalable Testing of Concurrent Programs

    DTIC Science & Technology

    2013-12-16

    The evaluation of CHESS [107] checked eight different programs ranging from process management libraries to a distributed execution engine to a research...tool (§3.1) targets systematic testing of scheduling nondeterminism in multi- threaded components of the Omega cluster management system [129], while...tool for systematic testing of multithreaded com- ponents of the Omega cluster management system [129]. In particular, §3.1.1 defines a model for

  9. Deployment of 802.15.4 Sensor Networks for C4ISR Operations

    DTIC Science & Technology

    2006-06-01

    43 Figure 20.MSP410CA Dense Grid Monitoring (Crossbow User’s Manual, 2005). ....................................44 Figure 21.(a)MICA2 without...Deployment of Sensor Grid (COASTS OPORD, 2006). ...56 Figure 27.Topology View of Two Nodes and Base Station .......57 Figure 28.Nodes Employing Multi...Random Access Memory TCP/IP Transmission Control Protocol/Internet Protocol TinyOS Tiny Micro Threading Operating System UARTs Universal

  10. Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems

    PubMed Central

    Abu-Nimeh, Faisal T.; Choong, Woon-Seng

    2017-01-01

    Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card’s ring buffers, bypassing the operating system kernel’s networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity. PMID:28630948

  11. ALFA: The new ALICE-FAIR software framework

    NASA Astrophysics Data System (ADS)

    Al-Turany, M.; Buncic, P.; Hristov, P.; Kollegger, T.; Kouzinopoulos, C.; Lebedev, A.; Lindenstruth, V.; Manafov, A.; Richter, M.; Rybalchenko, A.; Vande Vyvre, P.; Winckler, N.

    2015-12-01

    The commonalities between the ALICE and FAIR experiments and their computing requirements led to the development of large parts of a common software framework in an experiment independent way. The FairRoot project has already shown the feasibility of such an approach for the FAIR experiments and extending it beyond FAIR to experiments at other facilities[1, 2]. The ALFA framework is a joint development between ALICE Online- Offline (O2) and FairRoot teams. ALFA is designed as a flexible, elastic system, which balances reliability and ease of development with performance using multi-processing and multithreading. A message- based approach has been adopted; such an approach will support the use of the software on different hardware platforms, including heterogeneous systems. Each process in ALFA assumes limited communication and reliance on other processes. Such a design will add horizontal scaling (multiple processes) to vertical scaling provided by multiple threads to meet computing and throughput demands. ALFA does not dictate any application protocols. Potentially, any content-based processor or any source can change the application protocol. The framework supports different serialization standards for data exchange between different hardware and software languages.

  12. Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems.

    PubMed

    Abu-Nimeh, Faisal T; Choong, Woon-Seng

    2017-03-01

    Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card's ring buffers, bypassing the operating system kernel's networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity.

  13. Notch sensitivity jeopardizes titanium locking plate fatigue strength.

    PubMed

    Tseng, Wo-Jan; Chao, Ching-Kong; Wang, Chun-Chin; Lin, Jinn

    2016-12-01

    Notch sensitivity may compromise titanium-alloy plate fatigue strength. However, no studies providing head-to-head comparisons of stainless-steel or titanium-alloy locking plates exist. Custom-designed identically structured locking plates were made from stainless steel (F138 and F1314) or titanium alloy. Three screw-hole designs were compared: threaded screw-holes with angle edges (type I); threaded screw-holes with chamfered edges (type II); and non-threaded screw-holes with chamfered edges (type III). The plates' bending stiffness, bending strength, and fatigue life, were investigated. The stress concentration at the screw threads was assessed using finite element analyses (FEA). The titanium plates had higher bending strength than the F1314 and F138 plates (2.95:1.56:1) in static loading tests. For all metals, the type-III plate fatigue life was highest, followed by type-II and type-I. The type-III titanium plates had longer fatigue lives than their F138 counterparts, but the type-I and type-II titanium plates had significantly shorter fatigue lives. All F1314 plate types had longer fatigue lives than the type-III titanium plates. The FEA showed minimal stress difference (0.4%) between types II and III, but the stress for types II and III was lower (11.9% and 12.4%) than that for type I. The screw threads did not cause stress concentration in the locking plates in FEA, but may have jeopardized the fatigue strength, especially in the notch-sensitive titanium plates. Improvement to the locking plate design is necessary. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Three dimensional simulations of viscous folding in diverging microchannels

    NASA Astrophysics Data System (ADS)

    Xu, Bingrui; Chergui, Jalel; Shin, Seungwon; Juric, Damir

    2016-11-01

    Three dimensional simulations on the viscous folding in diverging microchannels reported by Cubaud and Mason are performed using the parallel code BLUE for multi-phase flows. The more viscous liquid L1 is injected into the channel from the center inlet, and the less viscous liquid L2 from two side inlets. Liquid L1 takes the form of a thin filament due to hydrodynamic focusing in the long channel that leads to the diverging region. The thread then becomes unstable to a folding instability, due to the longitudinal compressive stress applied to it by the diverging flow of liquid L2. We performed a parameter study in which the flow rate ratio, the viscosity ratio, the Reynolds number, and the shape of the channel were varied relative to a reference model. In our simulations, the cross section of the thread produced by focusing is elliptical rather than circular. The initial folding axis can be either parallel or perpendicular to the narrow dimension of the chamber. In the former case, the folding slowly transforms via twisting to perpendicular folding, or it may remain parallel. The direction of folding onset is determined by the velocity profile and the elliptical shape of the thread cross section in the channel that feeds the diverging part of the cell.

  15. Event Reconstruction for Many-core Architectures using Java

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graf, Norman A.; /SLAC

    Although Moore's Law remains technically valid, the performance enhancements in computing which traditionally resulted from increased CPU speeds ended years ago. Chip manufacturers have chosen to increase the number of core CPUs per chip instead of increasing clock speed. Unfortunately, these extra CPUs do not automatically result in improvements in simulation or reconstruction times. To take advantage of this extra computing power requires changing how software is written. Event reconstruction is globally serial, in the sense that raw data has to be unpacked first, channels have to be clustered to produce hits before those hits are identified as belonging tomore » a track or shower, tracks have to be found and fit before they are vertexed, etc. However, many of the individual procedures along the reconstruction chain are intrinsically independent and are perfect candidates for optimization using multi-core architecture. Threading is perhaps the simplest approach to parallelizing a program and Java includes a powerful threading facility built into the language. We have developed a fast and flexible reconstruction package (org.lcsim) written in Java that has been used for numerous physics and detector optimization studies. In this paper we present the results of our studies on optimizing the performance of this toolkit using multiple threads on many-core architectures.« less

  16. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for themore » context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.« less

  17. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

    PubMed

    Jiang, Hanyu; Ganesan, Narayan

    2016-02-27

    HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.

  18. Fastener apparatus

    NASA Technical Reports Server (NTRS)

    While, Donald M. (Inventor); Matza, Edward C. (Inventor)

    1987-01-01

    A fastening apparatus is adapted to be inserted and removed from one side of a work piece having an opposite side which is substantially inaccessible to a worker. A first, externally threaded member is threadingly engaged with a receiving structure, and a second member is inserted within corresponding seats or grooves for interlocking the two members. In the preferred embodiment diverting seats are provided for forming the second member into locking engagement between the receiving structure and the first member. In one embodiment, seat structures are provided for engaging frangible panels or the like for high temperature applications.

  19. Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Secchi, Simone; Tumeo, Antonino; Villa, Oreste

    Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less

  20. Modified locking thread form for fastener

    NASA Technical Reports Server (NTRS)

    Roopnarine, (Inventor); Vranish, John D. (Inventor)

    1998-01-01

    A threaded fastener has a standard part with a standard thread form characterized by thread walls with a standard included angle, and a modified part complementary to the standard part having a modified thread form characterized by thread walls which are symmetrically inclined with a modified included angle that is different from the standard included angle of the standard part's thread walls, such that the threads of one part make pre-loaded edge contact with the thread walls of the other part. The thread form of the modified part can have an included angle that is greater, less, or compound as compared to the included angle of the standard part. The standard part may be a bolt and the modified part a nut, or vice versa. The modified thread form holds securely even under large vibrational forces, it permits bi-directional use of standard mating threads, is impervious to the build up of tolerances and can be manufactured with a wider range of tolerances without loss of functionality, and distributes loading stresses (per thread) in a manner that decreases the possibility of single thread failure.

  1. Activate/Inhibit KGCS Gateway via Master Console EIC Pad-B Display

    NASA Technical Reports Server (NTRS)

    Ferreira, Pedro Henrique

    2014-01-01

    My internship consisted of two major projects for the Launch Control System.The purpose of the first project was to implement the Application Control Language (ACL) to Activate Data Acquisition (ADA) and to Inhibit Data Acquisition (IDA) the Kennedy Ground Control Sub-Systems (KGCS) Gateway, to update existing Pad-B End Item Control (EIC) Display to program the ADA and IDA buttons with new ACL, and to test and release the ACL Display.The second project consisted of unit testing all of the Application Services Framework (ASF) by March 21st. The XmlFileReader was unit tested and reached 100 coverage. The XmlFileReader class is used to grab information from XML files and use them to initialize elements in the other framework elements by using the Xerces C++ XML Parser; which is open source commercial off the shelf software. The ScriptThread was also tested. ScriptThread manages the creation and activation of script threads. A large amount of the time was used in initializing the environment and learning how to set up unit tests and getting familiar with the specific segments of the project that were assigned to us.

  2. Incorporation of a Decorin Biomimetic Enhances the Mechanical Properties of Electrochemically Aligned Collagen Threads

    PubMed Central

    Kishore, Vipuil; Paderi, John E.; Akkus, Anna; Smith, Katie M.; Balachandran, Dave; Beaudoin, Stephen; Panitch, Alyssa; Akkus, Ozan

    2011-01-01

    Orientational anisotropy of collagen molecules is integral for the mechanical strength of collagen-rich tissues. We have previously reported a novel methodology to synthesize highly oriented electrochemically aligned collagen (ELAC) threads with mechanical properties converging upon those of native tendon. Decorin, a small leucine rich proteoglycan (SLRP), binds to fibrillar collagen and has been suggested to enhance the mechanical properties of tendon. Based on the structure of natural decorin, we have previously designed and synthesized a peptidoglycan (DS-SILY) that mimics decorin both structurally and functionally. In this study, we investigated the effect of the incorporation of DS-SILY on the mechanical properties and structural organization of ELAC threads. The results indicated that the addition of DS-SILY at a molar ratio of 30:1 (Collagen:DS-SILY) significantly enhanced the ultimate stress and ultimate strain of the ELAC threads. Furthermore, differential scanning calorimetry revealed that the addition of DS-SILY at a molar ratio of 30:1 resulted in a more thermally stable collagen structure. However, addition of DS-SILY at a higher concentration (10:1 Collagen:DS-SILY) yielded weaker threads with mechanical properties comparable to collagen control threads. Transmission emission microscopy revealed that the addition of DS-SILY at a higher concentration (10:1) resulted in pronounced aggregation of collagen fibrils. More importantly, these aggregates were not aligned along the long axis of the ELAC thereby compromising on the overall tensile properties of the material. We conclude that incorporation of an optimal amount of DS-SILY is a promising approach to synthesize mechanically competent collagen based biomaterials for tendon tissue engineering applications. PMID:21356334

  3. Tuning orb spider glycoprotein glue performance to habitat humidity.

    PubMed

    Opell, Brent D; Jain, Dharamdeep; Dhinojwala, Ali; Blackledge, Todd A

    2018-03-26

    Orb-weaving spiders use adhesive threads to delay the escape of insects from their webs until the spiders can locate and subdue the insects. These viscous threads are spun as paired flagelliform axial fibers coated by a cylinder of solution derived from the aggregate glands. As low molecular mass compounds (LMMCs) in the aggregate solution attract atmospheric moisture, the enlarging cylinder becomes unstable and divides into droplets. Within each droplet an adhesive glycoprotein core condenses. The plasticity and axial line extensibility of the glycoproteins are maintained by hygroscopic LMMCs. These compounds cause droplet volume to track changes in humidity and glycoprotein viscosity to vary approximately 1000-fold over the course of a day. Natural selection has tuned the performance of glycoprotein cores to the humidity of a species' foraging environment by altering the composition of its LMMCs. Thus, species from low-humidity habits have more hygroscopic threads than those from humid forests. However, at their respective foraging humidities, these species' glycoproteins have remarkably similar viscosities, ensuring optimal droplet adhesion by balancing glycoprotein adhesion and cohesion. Optimal viscosity is also essential for integrating the adhesion force of multiple droplets. As force is transferred to a thread's support line, extending droplets draw it into a parabolic configuration, implementing a suspension bridge mechanism that sums the adhesive force generated over the thread span. Thus, viscous capture threads extend an orb spider's phenotype as a highly integrated complex of large proteins and small molecules that function as a self-assembling, highly tuned, environmentally responsive, adhesive biomaterial. Understanding the synergistic role of chemistry and design in spider adhesives, particularly the ability to stick in wet conditions, provides insight in designing synthetic adhesives for biomedical applications. © 2018. Published by The Company of Biologists Ltd.

  4. Three-dimensional printing spiders: back-and-forth glue application yields silk anchorages with high pull-off resistance under varying loading situations

    PubMed Central

    Herberstein, Marie E.

    2017-01-01

    The anchorage of structures is a crucial element of construction, both for humans and animals. Spiders use adhesive plaques to attach silk threads to substrates. Both biological and artificial adhesive structures usually have an optimal loading angle, and are prone to varying loading situations. Silk anchorages, however, must cope with loading in highly variable directions. Here we show that the detachment forces of thread anchorages of orb-web spiders are highly robust against pulling in different directions. This is gained by a two-step back-and-forth spinning pattern during the rapid production of the adhesive plaque, which shifts the thread insertion point towards the plaque centre and forms a flexible tree root-like network of branching fibres around the loading point. Using a morphometric approach and a tape-and-thread model we show that neither area, nor width of the plaque, but the shift of the loading point towards the plaque centre has the highest effect on pull-off resistance. This is explained by a circular propagation of the delamination crack with a low peeling angle. We further show that silken attachment discs are highly directional and adjusted to provide maximal performance in the upstream dragline. These results show that the way the glue is applied, crucially enhances the toughness of the anchorage without the need of additional material intake. This work is a starting point to study the evolution of tough and universal thread anchorages among spiders, and to develop bioinspired ‘instant’ anchorages of thread- and cable-like structures to a broad bandwidth of substrates. PMID:28228539

  5. Three-dimensional printing spiders: back-and-forth glue application yields silk anchorages with high pull-off resistance under varying loading situations.

    PubMed

    Wolff, Jonas O; Herberstein, Marie E

    2017-02-01

    The anchorage of structures is a crucial element of construction, both for humans and animals. Spiders use adhesive plaques to attach silk threads to substrates. Both biological and artificial adhesive structures usually have an optimal loading angle, and are prone to varying loading situations. Silk anchorages, however, must cope with loading in highly variable directions. Here we show that the detachment forces of thread anchorages of orb-web spiders are highly robust against pulling in different directions. This is gained by a two-step back-and-forth spinning pattern during the rapid production of the adhesive plaque, which shifts the thread insertion point towards the plaque centre and forms a flexible tree root-like network of branching fibres around the loading point. Using a morphometric approach and a tape-and-thread model we show that neither area, nor width of the plaque, but the shift of the loading point towards the plaque centre has the highest effect on pull-off resistance. This is explained by a circular propagation of the delamination crack with a low peeling angle. We further show that silken attachment discs are highly directional and adjusted to provide maximal performance in the upstream dragline. These results show that the way the glue is applied, crucially enhances the toughness of the anchorage without the need of additional material intake. This work is a starting point to study the evolution of tough and universal thread anchorages among spiders, and to develop bioinspired 'instant' anchorages of thread- and cable-like structures to a broad bandwidth of substrates. © 2017 The Author(s).

  6. Cutting thread at flexible endoscopy.

    PubMed

    Gong, F; Swain, P; Kadirkamanathan, S; Hepworth, C; Laufer, J; Shelton, J; Mills, T

    1996-12-01

    New thread-cutting techniques were developed for use at flexible endoscopy. A guillotine was designed to follow and cut thread at the endoscope tip. A new method was developed for guiding suture cutters. Efficacy of Nd: YAG laser cutting of threads was studied. Experimental and clinical experience with thread-cutting methods is presented. A 2.4 mm diameter flexible thread-cutting guillotine was constructed featuring two lateral holes with sharp edges through which sutures to be cut are passed. Standard suture cutters were guided by backloading thread through the cutters extracorporeally. A snare cutter was constructed to retrieve objects sewn to tissue. Efficacy and speed of Nd: YAG laser in cutting twelve different threads were studied. The guillotine cut thread faster (p < 0.05) than standard suture cutters. Backloading thread shortened time taken to cut thread (p < 0.001) compared with free-hand cutting. Nd: YAG laser was ineffective in cutting uncolored threads and slower than mechanical cutters. Results of thread cutting in clinical studies using sewing machine (n = 77 cutting episodes in 21 patients), in-vivo experiments (n = 156), and postsurgical cases (n = 15 over 15 years) are presented. New thread-cutting methods are described and their efficacy demonstrated in experimental and clinical studies.

  7. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  8. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE PAGES

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...

    2017-07-03

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  9. Capillary interconnect device

    DOEpatents

    Renzi, Ronald F.

    2007-12-25

    A manifold for connecting external capillaries to the inlet and/or outlet ports of a microfluidic device for high pressure applications is provided. The fluid connector for coupling at least one fluid conduit to a corresponding port of a substrate that includes: (i) a manifold comprising one or more channels extending therethrough wherein each channel is at least partially threaded, (ii) one or more threaded ferrules each defining a bore extending therethrough with each ferrule supporting a fluid conduit wherein each ferrule is threaded into a channel of the manifold, (iii) a substrate having one or more ports on its upper surface wherein the substrate is positioned below the manifold so that the one or more ports is aligned with the one or more channels of the manifold, and (iv) means for applying an axial compressive force to the substrate to couple the one or more ports of the substrate to a corresponding proximal end of a fluid conduit.

  10. Edge compression manifold apparatus

    DOEpatents

    Renzi, Ronald F.

    2004-12-21

    A manifold for connecting external capillaries to the inlet and/or outlet ports of a microfluidic device for high pressure applications is provided. The fluid connector for coupling at least one fluid conduit to a corresponding port of a substrate that includes: (i) a manifold comprising one or more channels extending therethrough wherein each channel is at least partially threaded, (ii) one or more threaded ferrules each defining a bore extending therethrough with each ferrule supporting a fluid conduit wherein each ferrule is threaded into a channel of the manifold, (iii) a substrate having one or more ports on its upper surface wherein the substrate is positioned below the manifold so that the one or more ports is aligned with the one or more channels of the manifold, and (iv) device to apply an axial compressive force to the substrate to couple the one or more ports of the substrate to a corresponding proximal end of a fluid conduit.

  11. Edge compression manifold apparatus

    DOEpatents

    Renzi, Ronald F [Tracy, CA

    2007-02-27

    A manifold for connecting external capillaries to the inlet and/or outlet ports of a microfluidic device for high pressure applications is provided. The fluid connector for coupling at least one fluid conduit to a corresponding port of a substrate that includes: (i) a manifold comprising one or more channels extending therethrough wherein each channel is at least partially threaded, (ii) one or more threaded ferrules each defining a bore extending therethrough with each ferrule supporting a fluid conduit wherein each ferrule is threaded into a channel of the manifold, (iii) a substrate having one or more ports on its upper surface wherein the substrate is positioned below the manifold so that the one or more ports is aligned with the one or more channels of the manifold, and (iv) device to apply an axial compressive force to the substrate to couple the one or more ports of the substrate to a corresponding proximal end of a fluid conduit.

  12. Two Dimensional (2D) P-Aramid Dry Multi-Layered Woven Fabrics Deformational Behaviour for Technical Applications

    NASA Astrophysics Data System (ADS)

    Abtew, M. A.; Loghin, C.; Cristian, I.; Boussu, F.; Bruniaux, P.; Chen, Y.; Wang, L.

    2018-06-01

    In today’s scenario for the various technical applications, from composites to body armour, the material mouldability along with its mechanical property become very important. In the present study, two dimensional (2D) woven fabrics made of para-aramid high performance fibres in multi-layer dry structure were used for investigating different forming characteristics. The different layers were arranged with 0°/90° orientation for deep drawing formability test to analyse the effect of number of layers and blank-holder pressure (BHP) during the test. Specific preforming device with low speed forming process and predefined hemispherical shape of punch has been applied. Using fine photographic analysis, some important 2D multi-layer fabrics forming characteristics i.e., material drawing-in, surface shear angle etc. from the imposed deformation have been observed, measured and analysed for better understanding and co MPa rison. The result revealed that the mouldability behaviour of the multi-layered dry textile fabric preforms is directional, and closely dependent on blank-holding pressure and number of layers. This indicates both parameters should be carefully considered while material deformation to avoid the formation of wrinkling and maintain other mechanical properties on final application.

  13. Influence of bisphosphonates on alveolar bone loss around osseointegrated implants.

    PubMed

    Zahid, Talal M; Wang, Bing-Yan; Cohen, Robert E

    2011-06-01

    The relationship between bisphosphonates (BP) and dental implant failure has not been fully elucidated. The purpose of this retrospective radiographic study was to examine whether patients who take BP are at greater risk of implant failure than patients not using those agents. Treatment records of 362 consecutively treated patients receiving endosseous dental implants were reviewed. The patient population consisted of 227 women and 135 men with a mean age of 56 years (range: 17-87 years), treated in the University at Buffalo Postgraduate Clinic from 1997-2008. Demographic information collected included age, gender, smoking status, as well as systemic conditions and medication use. Implant characteristics reviewed included system, date of placement, date of follow-up radiographs, surgical complications, number of exposed threads, and implant failure. The relationship between BP and implant failure was analyzed using generalized estimating equation (GEE) analysis. Twenty-six patients using BP received a total of 51 dental implants. Three implants failed, yielding success rates of 94.11% and 88.46% for the implant-based and subject-based analyses, respectively. Using the GEE statistical method we found a statistically significant (P  =  .001; OR  =  3.25) association between the use of BP and implant thread exposure. None of the other variables studied were statistically associated with implant failure or thread exposure. In conclusion, patients taking BP may be at higher risk for implant thread exposure.

  14. Tool Removes Coil-Spring Thread Inserts

    NASA Technical Reports Server (NTRS)

    Collins, Gerald J., Jr.; Swenson, Gary J.; Mcclellan, J. Scott

    1991-01-01

    Tool removes coil-spring thread inserts from threaded holes. Threads into hole, pries insert loose, grips insert, then pulls insert to thread it out of hole. Effects essentially reverse of insertion process to ease removal and avoid further damage to threaded inner surface of hole.

  15. High-Performance, Multi-Node File Copies and Checksums for Clustered File Systems

    NASA Technical Reports Server (NTRS)

    Kolano, Paul Z.; Ciotti, Robert B.

    2012-01-01

    Modern parallel file systems achieve high performance using a variety of techniques, such as striping files across multiple disks to increase aggregate I/O bandwidth and spreading disks across multiple servers to increase aggregate interconnect bandwidth. To achieve peak performance from such systems, it is typically necessary to utilize multiple concurrent readers/writers from multiple systems to overcome various singlesystem limitations, such as number of processors and network bandwidth. The standard cp and md5sum tools of GNU coreutils found on every modern Unix/Linux system, however, utilize a single execution thread on a single CPU core of a single system, and hence cannot take full advantage of the increased performance of clustered file systems. Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multinode cooperation allows different nodes to take part in the same copy/checksum. Split-file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. Mcp and msum provide significant performance improvements over standard cp and md5sum using multiple types of parallelism and other optimizations. The total speed-ups from all improvements are significant. Mcp improves cp performance over 27x, msum improves md5sum performance almost 19x, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22x. These improvements come in the form of drop-in replacements for cp and md5sum, so are easily used and are available for download as open source software at http://mutil.sourceforge.net.

  16. Analysis of an Online Match Discussion Board: Improving the Otolaryngology—Head and Neck Surgery Match

    PubMed Central

    Kozin, Elliott D.; Sethi, Rosh; Lehmann, Ashton; Remenschneider, Aaron K.; Golub, Justin S.; Reyes, Samuel A.; Emerick, Kevin; Lee, Daniel J.; Gray, Stacey T.

    2015-01-01

    Introduction “The Match” has become the accepted selection process for graduate medical education. Otomatch.com has provided an online forum for Otolaryngology-Head and Neck Surgery (OHNS) Match-related questions for over a decade. Herein, we aim to 1) delineate the types of posts on Otomatch to better understand the perspective of medical students applying for residency and 2) provide recommendations to potentially improve the Match process. Methods Discussion forum posts on Otomatch between December 2001 and April 2014 were reviewed. The title of each thread and total number of views were recorded for quantitative analysis. Each thread was organized into one of six major categories and one of eighteen subcategories, based on chronology within the application cycle and topic. National Resident Matching Program (NRMP) data were utilized for comparison. Results We identified 1,921 threads corresponding to over 2 million page views. Over 40% of threads related to questions about specific programs, and 27% were discussions about interviews. Views, a surrogate measure for popularity, reflected different trends. The majority of individuals viewed posts on interviews (42%), program specific questions (20%) and how to rank programs (11%). Increase in viewership tracked with a rise in applicant numbers based on NRMP data. Conclusions Our study provides an in depth analysis of a popular discussion forum for medical students interested in the OHNS Match. The most viewed posts are about interview dates and questions regarding specific programs. We provide suggestions to address unmet needs for medical students and potentially improve the Match process. PMID:25550223

  17. Thread gauge for measuring thread pitch diameters

    DOEpatents

    Brewster, A.L.

    1985-11-19

    A thread gauge which attaches to a vernier caliper to measure the thread pitch diameter of both externally threaded and internally threaded parts is disclosed. A pair of anvils are externally threaded with threads having the same pitch as those of the threaded part. Each anvil is mounted on a stem having a ball on which the anvil can rotate to properly mate with the parts to which the anvils are applied. The stems are detachably secured to the caliper blades by attachment collars having keyhole openings for receiving the stems and caliper blades. A set screw is used to secure each collar on its caliper blade. 2 figs.

  18. Thread gauge for measuring thread pitch diameters

    DOEpatents

    Brewster, Albert L.

    1985-01-01

    A thread gauge which attaches to a vernier caliper to measure the thread pitch diameter of both externally threaded and internally threaded parts. A pair of anvils are externally threaded with threads having the same pitch as those of the threaded part. Each anvil is mounted on a stem having a ball on which the anvil can rotate to properly mate with the parts to which the anvils are applied. The stems are detachably secured to the caliper blades by attachment collars having keyhole openings for receiving the stems and caliper blades. A set screw is used to secure each collar on its caliper blade.

  19. MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware.

    PubMed

    Lommen, Arjen; Kools, Harrie J

    2012-08-01

    A new, multi-threaded version of the GC-MS and LC-MS data processing software, metAlign, has been developed which is able to utilize multiple cores on one PC. This new version was tested using three different multi-core PCs with different operating systems. The performance of noise reduction, baseline correction and peak-picking was 8-19 fold faster compared to the previous version on a single core machine from 2008. The alignment was 5-10 fold faster. Factors influencing the performance enhancement are discussed. Our observations show that performance scales with the increase in processor core numbers we currently see in consumer PC hardware development.

  20. Solving Large Problems Quickly: Progress in 2001-2003

    NASA Technical Reports Server (NTRS)

    Mowry, Todd C.; Colohan, Christopher B.; Brown, Angela Demke; Steffan, J. Gregory; Zhai, Antonia

    2004-01-01

    This document describes the progress we have made and the lessons we have learned in 2001 through 2003 under the NASA grant entitled "Solving Important Problems Faster". The long-term goal of this research is to accelerate large, irregular scientific applications which have enormous data sets and which are difficult to parallelize. To accomplish this goal, we are exploring two complementary techniques: (i) using compiler-inserted prefetching to automatically hide the I/O latency of accessing these large data sets from disk; and (ii) using thread-level data speculation to enable the optimistic parallelization of applications despite uncertainty as to whether data dependences exist between the resulting threads which would normally make them unsafe to execute in parallel. Overall, we made significant progress in 2001 through 2003, and the project has gone well.

  1. High Affinity Macrocycle Threading by a Near-Infrared Croconaine Dye with Flanking Polymer Chains

    PubMed Central

    Liu, Wenqi; Peck, Evan M.; Smith, Bradley D.

    2016-01-01

    Croconaine dyes have narrow and intense absorption bands at ~800 nm, very weak fluorescence, and high photostabilities, which combine to make them very attractive chromophores for absorption-based imaging or laser heating technologies. The physical supramolecular properties of croconaine dyes have rarely been investigated, especially in water. This study focuses on a molecular threading process that encapsulates a croconaine dye inside a tetralactam macrocycle in organic or aqueous solvent. Macrocycle association and rate constant data are reported for a series of croconaine structures with different substituents attached to the ends of the dye. The association constants were highest in water (Ka ~109 M−1), and the threading rate constants (kon) increased in the solvent order H2O > MeOH > CHCl3. Systematic variation of croconaine substituents located just outside the croconaine/macrocycle complexation interface hardly changed Ka but had a strong influence on kon. A croconaine dye with N-propyl groups at each end of the structure exhibited a desirable mixture of macrocycle threading properties; that is, there was rapid and quantitative croconaine/macrocycle complexation at relatively high concentrations in water, and no dissociation of the pre-assembled complex when it was diluted into a solution of fetal bovine serum, even after laser induced photothermal heating of the solution. The combination of favorable near-infrared absorption properties and tunable mechanical stability makes threaded croconaine/macrocycle complexes very attractive as molecular probes or as supramolecular composites for various applications in absorption-based imaging or photothermal therapy. PMID:26807599

  2. Ultrasonic extensometer measures bolt preload

    NASA Technical Reports Server (NTRS)

    Daniels, C. M., Jr.

    1978-01-01

    Extensometer using ultrasonic pulse reflections to measure elongations in tightened belts and studs is much more accurate than conventional torque wrenches in application of specified preload to bolts and other threaded fasteners.

  3. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

    DOE PAGES

    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    2014-07-22

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less

  4. 78 FR 76815 - Steel Threaded Rod From India: Preliminary Affirmative Countervailing Duty Determination and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-19

    ... DEPARTMENT OF COMMERCE International Trade Administration [C-533-856] Steel Threaded Rod From... exporters of steel threaded rod from India. The period of investigation (``POI'') is January 1, 2012... this investigation is steel threaded rod. Steel threaded rod is certain threaded rod, bar, or studs, of...

  5. Surface Analysis of the Laser Cleaned Metal Threads

    NASA Astrophysics Data System (ADS)

    Sokhan, M.; Hartog, F.; McPhail, D.

    The laser cleaning of the tarnished silver threads was carried out using Nd:YAG laser radiation at IR (1064 nm) and visible wavelengths (532 nm). The preliminary tests were made on the piece of silk with the silver embroidery with the clean and tarnished areas. FIBS and SIMS analysis were used for analysing the condition of the surface before and after laser irradiation. It was found that irradiation below 0.4 J/cm-2 and higher than 1.0 J/cm-2 fluences aggravates the process of tarnishing and leads to the yellowing effect. The results of preliminary tests were used for finding the optimum cleaning regime for the laser cleaning of the real museum artefact: "Women Riding Jacket" dated to the beginning of 18th century.

  6. In-situ fluorimetry: A powerful non-invasive diagnostic technique for natural dyes used in artefacts. Part II. Identification of orcein and indigo in Renaissance tapestries

    NASA Astrophysics Data System (ADS)

    Clementi, C.; Miliani, C.; Romani, A.; Santamaria, U.; Morresi, F.; Mlynarska, K.; Favaro, G.

    2009-01-01

    In this paper, three Renaissance tapestries depicting scenes painted by Raffaello Sanzio, conserved at the Vatican Museum, were investigated using in-situ UV-Visible fluorimetric measurements. The results show that this technique is suitable for the detection of natural organic colorants used for dyeing the threads woven in these tapestries. The emission signals detected on red-purple colours were assigned to the colorant orcein and those on different nuances of blue and green colours to indigo by comparison with data from reference laboratory samples. The assignments were supported by chromatographic experiments carried out on threads taken from the back side of the tapestry in the same points analysed by spectrofluorimentry.

  7. In-situ fluorimetry: a powerful non-invasive diagnostic technique for natural dyes used in artefacts. Part II. Identification of orcein and indigo in Renaissance tapestries.

    PubMed

    Clementi, C; Miliani, C; Romani, A; Santamaria, U; Morresi, F; Mlynarska, K; Favaro, G

    2009-01-01

    In this paper, three Renaissance tapestries depicting scenes painted by Raffaello Sanzio, conserved at the Vatican Museum, were investigated using in-situ UV-Visible fluorimetric measurements. The results show that this technique is suitable for the detection of natural organic colorants used for dyeing the threads woven in these tapestries. The emission signals detected on red-purple colours were assigned to the colorant orcein and those on different nuances of blue and green colours to indigo by comparison with data from reference laboratory samples. The assignments were supported by chromatographic experiments carried out on threads taken from the back side of the tapestry in the same points analysed by spectrofluorimentry.

  8. Towards a sustainable world through human factors and ergonomics: it is all about values.

    PubMed

    Lange-Morales, Karen; Thatcher, Andrew; García-Acosta, Gabriel

    2014-01-01

    In this paper, we analyse two approaches that attempt to address how a human factors and ergonomics (HFE) perspective can contribute to the sustainability of the human race. We outline the principles, purposes and fields of application of ergoecology and green ergonomics, and thereafter deal with their context of emergence, and the overlaps in purpose, and principles. Shared values are deduced and related to socio-technical principles for systems' design. Social responsibility and environmental/ecospheric responsibility are the leading threads of ergoecology and green ergonomics, giving rise to the values of: respect for human rights, respect for the Earth, respect for ethical decision-making, appreciation of complexity, respect for transparency and openness, and respect for diversity. We discuss the consequences of considering these values in HFE theory and practice.

  9. Hardware based redundant multi-threading inside a GPU for improved reliability

    DOEpatents

    Sridharan, Vilas; Gurumurthi, Sudhanva

    2015-05-05

    A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.

  10. Stream Splitting in Support of Intrusion Detection

    DTIC Science & Technology

    2003-06-01

    increased. Every computer on the Internet has no need to see the traffic of every other computer on the Internet. Indeed if this was so, nothing would get ...distinguishes the stream splitter from other network analysis tools. B. HIGH LEVEL DESIGN To get the desired level of performance, a multi-threaded...of greater concern than added accuracy of a Bayesian model. This is a case where close is good enough . b. PassiveSensors Though similar to active

  11. An adaptive transmission protocol for managing dynamic shared states in collaborative surgical simulation.

    PubMed

    Qin, J; Choi, K S; Ho, Simon S M; Heng, P A

    2008-01-01

    A force prediction algorithm is proposed to facilitate virtual-reality (VR) based collaborative surgical simulation by reducing the effect of network latencies. State regeneration is used to correct the estimated prediction. This algorithm is incorporated into an adaptive transmission protocol in which auxiliary features such as view synchronization and coupling control are equipped to ensure the system consistency. We implemented this protocol using multi-threaded technique on a cluster-based network architecture.

  12. High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nagasaka, Y; Matsuoka, S; Azad, A

    Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less

  13. MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses

    PubMed Central

    Nakayasu, Ernesto S.; Nicora, Carrie D.; Sims, Amy C.; Burnum-Johnson, Kristin E.; Kim, Young-Mo; Kyle, Jennifer E.; Matzke, Melissa M.; Shukla, Anil K.; Chu, Rosalie K.; Schepmoes, Athena A.; Jacobs, Jon M.; Baric, Ralph S.; Webb-Robertson, Bobbie-Jo; Smith, Richard D.

    2016-01-01

    ABSTRACT Integrative multi-omics analyses can empower more effective investigation and complete understanding of complex biological systems. Despite recent advances in a range of omics analyses, multi-omic measurements of the same sample are still challenging and current methods have not been well evaluated in terms of reproducibility and broad applicability. Here we adapted a solvent-based method, widely applied for extracting lipids and metabolites, to add proteomics to mass spectrometry-based multi-omics measurements. The metabolite, protein, and lipid extraction (MPLEx) protocol proved to be robust and applicable to a diverse set of sample types, including cell cultures, microbial communities, and tissues. To illustrate the utility of this protocol, an integrative multi-omics analysis was performed using a lung epithelial cell line infected with Middle East respiratory syndrome coronavirus, which showed the impact of this virus on the host glycolytic pathway and also suggested a role for lipids during infection. The MPLEx method is a simple, fast, and robust protocol that can be applied for integrative multi-omic measurements from diverse sample types (e.g., environmental, in vitro, and clinical). IMPORTANCE In systems biology studies, the integration of multiple omics measurements (i.e., genomics, transcriptomics, proteomics, metabolomics, and lipidomics) has been shown to provide a more complete and informative view of biological pathways. Thus, the prospect of extracting different types of molecules (e.g., DNAs, RNAs, proteins, and metabolites) and performing multiple omics measurements on single samples is very attractive, but such studies are challenging due to the fact that the extraction conditions differ according to the molecule type. Here, we adapted an organic solvent-based extraction method that demonstrated broad applicability and robustness, which enabled comprehensive proteomics, metabolomics, and lipidomics analyses from the same sample. Author Video: An author video summary of this article is available. PMID:27822525

  14. HEXPANDO Expanding Head for Fastener-Retention Hexagonal Wrench

    NASA Technical Reports Server (NTRS)

    Bishop, John

    2011-01-01

    The HEXPANDO is an expanding-head hexagonal wrench designed to retain fasteners and keep them from being dislodged from the tool. The tool is intended to remove or install socket-head cap screws (SHCSs) in remote, hard-to-reach locations or in circumstances when a dropped fastener could cause damage to delicate or sensitive hardware. It is not intended for application of torque. This tool is made of two assembled portions. The first portion of the tool comprises tubing, or a hollow shaft, at a length that gives the user adequate reach to the intended location. At one end of the tubing is the expanding hexagonal head fitting with six radial slits cut into it (one at each of the points of the hexagonal shape), and a small hole drilled axially through the center and the end opposite the hex is internally and externally threaded. This fitting is threaded into the shaft (via external threads) and staked or bonded so that it will not loosen. At the other end of the tubing is a knurled collar with a through hole into which the tubing is threaded. This knob is secured in place by a stop nut. The second assembled portion of the tool comprises a length of all thread or solid rod that is slightly longer than the steel tubing. One end has a slightly larger knurled collar affixed while the other end is tapered/pointed and threaded. When the two portions are assembled, the all thread/rod portion feeds through the tubing and is threaded into the expanding hex head fitting. The tapered point allows it to be driven into the through hole of the hex fitting. While holding the smaller collar on the shaft, the user turns the larger collar, and as the threads feed into the fitting, the hex head expands and grips the SHCS, thus providing a safe way to install and remove fasteners. The clamping force retaining the SHCS varies depending on how far the tapered end is inserted into the tool head. Initial tests of the prototype tool, designed for a 5 mm or # 10SHCS have resulted in up to 8 lb (.35.6 N) of pull force to dislodge the SHCS from the tool. The tool is designed with a lead-in angle from the diameter of the tubing to a diameter the same as the fastener head, to prevent the fastener head from catching on any obstructions encountered that could dislodge the fastener during retrieval.

  15. Multi-View Multi-Instance Learning Based on Joint Sparse Representation and Multi-View Dictionary Learning.

    PubMed

    Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve

    2017-12-01

    In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.

  16. Two years' outcome of thread lifting with absorbable barbed PDO threads: Innovative score for objective and subjective assessment.

    PubMed

    Ali, Yasser Helmy

    2018-02-01

    Thread-lifting rejuvenation procedures have evolved again, with the development of absorbable threads. Although they have gained popularity among plastic surgeons and dermatologists, very few articles have been written in literature about absorbable threads. This study aims to evaluate two years' outcome of thread lifting using absorbable barbed threads for facial rejuvenation. Prospective comparative stud both objectively and subjectively and follow-up assessment for 24 months. Thread lifting for face rejuvenation has significant long-lasting effects that include skin lifting from 3-10 mm and high degree of patients' satisfaction with less incidence rate of complications, about 4.8%. Augmented results are obtained when thread lifting is combined with other lifting and rejuvenation modalities. Significant facial rejuvenation is achieved by thread lifting and highly augmented results are observed when they are combined with Botox, fillers, and/or platelet rich plasma (PRP) rejuvenations.

  17. Thread gauge for tapered threads

    DOEpatents

    Brewster, Albert L.

    1994-01-11

    The thread gauge permits the user to determine the pitch diameter of tapered threads at the intersection of the pitch cone and the end face of the object being measured. A pair of opposed anvils having lines of threads which match the configuration and taper of the threads on the part being measured are brought into meshing engagement with the threads on opposite sides of the part. The anvils are located linearly into their proper positions by stop fingers on the anvils that are brought into abutting engagement with the end face of the part. This places predetermined reference points of the pitch cone of the thread anvils in registration with corresponding points on the end face of the part being measured, resulting in an accurate determination of the pitch diameter at that location. The thread anvils can be arranged for measuring either internal or external threads.

  18. Thread gauge for tapered threads

    DOEpatents

    Brewster, A.L.

    1994-01-11

    The thread gauge permits the user to determine the pitch diameter of tapered threads at the intersection of the pitch cone and the end face of the object being measured. A pair of opposed anvils having lines of threads which match the configuration and taper of the threads on the part being measured are brought into meshing engagement with the threads on opposite sides of the part. The anvils are located linearly into their proper positions by stop fingers on the anvils that are brought into abutting engagement with the end face of the part. This places predetermined reference points of the pitch cone of the thread anvils in registration with corresponding points on the end face of the part being measured, resulting in an accurate determination of the pitch diameter at that location. The thread anvils can be arranged for measuring either internal or external threads. 13 figures.

  19. In-depth proteomic analysis of the byssus from marine mussel Mytilus coruscus.

    PubMed

    Qin, Chuan-Li; Pan, Qi-Dong; Qi, Qi; Fan, Mei-Hua; Sun, Jing-Jing; Li, Nan-Nan; Liao, Zhi

    2016-07-20

    Mussels attach to various submerged surfaces by using the byssus, which contains different proteins and is a promising source of water-resistant bio-adhesives for potential use in biotechnological and medical applications. The protein composition of the byssus has not yet been fully understood although at least eleven byssal proteins were characterized previously. In order to increase genomic resources and identify new byssal proteins from mussel Mytilus coruscus, high-throughput Illumina sequencing was undertaken on the foot, and 79,997,776 paired-ends reads were generated, yielding a library containing 88,825ft unigenes. The M. coruscus byssus was divided into three parts, the proximal thread, the distal thread, and the plaque. Byssal proteins from each part of the byssus were analyzed by shotgun-LTQ analysis. The MS/MS spectra were searched against the foot unigenes dataset and 48 byssal proteins were identified from the M. coruscus byssus. From the whole set, 17, 5, and 11 proteins were exclusive to the proximal thread, the distal thread, and the plaque, respectively. These data can be used as a resource for further studies on the roles of byssal proteins in the deposition of different byssus parts (thread vs. plaque) or in the different mechanical properties (tenacity vs. adhesion). Byssal proteins are the major component that controls different aspects of the byssal formation process and thus a source of bioactive molecules that would offer interesting perspectives in biomaterials and bio-adhesive fields. In this paper, we characterized the protein set from different partsof Mytilus coruscus byssus by a combination of transcriptome/proteome technical. A whole set of 48 byssal proteins were described here, including proteins of collagen-like, C1q domain-containing, protease inhibitor-like, tyrosinase-like, SOD, and others. Thread (the distal portion and the proximal portion) and plaque showed distinct protein composition. Of the whole byssal protein set, 11 are exclusive to the plaque, 17 are exclusive to the proximal thread, and 5 are exclusive to the distal thread. Only four proteins are shared by all the three parts of the byssus. The new byssal proteins reported here represent a significant expansion of the knowledge base of Mytilus byssal proteins, and are important for further exploring the mechanism of adhesion in mussel. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. 46 CFR 56.15-1 - Pipe joining fittings.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... subpart 50.25 of this subchapter are acceptable for use in piping systems. (b) Threaded, flanged, socket-welding, buttwelding, and socket-brazing pipe joining fittings, made in accordance with the applicable...

  1. 46 CFR 56.15-1 - Pipe joining fittings.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... subpart 50.25 of this subchapter are acceptable for use in piping systems. (b) Threaded, flanged, socket-welding, buttwelding, and socket-brazing pipe joining fittings, made in accordance with the applicable...

  2. 46 CFR 56.15-1 - Pipe joining fittings.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... subpart 50.25 of this subchapter are acceptable for use in piping systems. (b) Threaded, flanged, socket-welding, buttwelding, and socket-brazing pipe joining fittings, made in accordance with the applicable...

  3. 46 CFR 56.15-1 - Pipe joining fittings.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... subpart 50.25 of this subchapter are acceptable for use in piping systems. (b) Threaded, flanged, socket-welding, buttwelding, and socket-brazing pipe joining fittings, made in accordance with the applicable...

  4. OpenMP parallelization of a gridded SWAT (SWATG)

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

    2017-12-01

    Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

  5. A Survey of New Trends in Symbolic Execution for Software Testing and Analysis

    NASA Technical Reports Server (NTRS)

    Pasareanu, Corina S.; Visser, Willem

    2009-01-01

    Symbolic execution is a well-known program analysis technique which represents values of program inputs with symbolic values instead of concrete (initialized) data and executes the program by manipulating program expressions involving the symbolic values. Symbolic execution has been proposed over three decades ago but recently it has found renewed interest in the research community, due in part to the progress in decision procedures, availability of powerful computers and new algorithmic developments. We provide a survey of some of the new research trends in symbolic execution, with particular emphasis on applications to test generation and program analysis. We first describe an approach that handles complex programming constructs such as input data structures, arrays, as well as multi-threading. We follow with a discussion of abstraction techniques that can be used to limit the (possibly infinite) number of symbolic configurations that need to be analyzed for the symbolic execution of looping programs. Furthermore, we describe recent hybrid techniques that combine concrete and symbolic execution to overcome some of the inherent limitations of symbolic execution, such as handling native code or availability of decision procedures for the application domain. Finally, we give a short survey of interesting new applications, such as predictive testing, invariant inference, program repair, analysis of parallel numerical programs and differential symbolic execution.

  6. Thread angle dependency on flame spread shape over kenaf/polyester combined fabric

    NASA Astrophysics Data System (ADS)

    Azahari Razali, Mohd; Sapit, Azwan; Nizam Mohammed, Akmal; Nor Anuar Mohamad, Md; Nordin, Normayati; Sadikin, Azmahani; Faisal Hushim, Mohd; Jaat, Norrizam; Khalid, Amir

    2017-09-01

    Understanding flame spread behavior is crucial to Fire Safety Engineering. It is noted that the natural fiber exhibits different flame spread behavior than the one of the synthetic fiber. This different may influences the flame spread behavior over combined fabric. There is a research has been done to examined the flame spread behavior over kenaf/polyester fabric. It is seen that the flame spread shape is dependent on the thread angle dependency. However, the explanation of this phenomenon is not described in detail in that research. In this study, explanation about this phenomenon is given in detail. Results show that the flame spread shape is dependent on the position of synthetic thread. For thread angle, θ = 0°, the polyester thread is breaking when the flame approach to the thread and the kenaf thread tends to move to the breaking direction. This behavior produces flame to be ‘V’ shape. However, for thread angle, θ = 90°, the polyester thread melts while the kenaf thread decomposed and burned. At this angle, the distance between kenaf threads remains constant as flame approaches.

  7. Generalized Symbolic Execution for Model Checking and Testing

    NASA Technical Reports Server (NTRS)

    Khurshid, Sarfraz; Pasareanu, Corina; Visser, Willem; Kofmeyer, David (Technical Monitor)

    2003-01-01

    Modern software systems, which often are concurrent and manipulate complex data structures must be extremely reliable. We present a novel framework based on symbolic execution, for automated checking of such systems. We provide a two-fold generalization of traditional symbolic execution based approaches: one, we define a program instrumentation, which enables standard model checkers to perform symbolic execution; two, we give a novel symbolic execution algorithm that handles dynamically allocated structures (e.g., lists and trees), method preconditions (e.g., acyclicity of lists), data (e.g., integers and strings) and concurrency. The program instrumentation enables a model checker to automatically explore program heap configurations (using a systematic treatment of aliasing) and manipulate logical formulae on program data values (using a decision procedure). We illustrate two applications of our framework: checking correctness of multi-threaded programs that take inputs from unbounded domains with complex structure and generation of non-isomorphic test inputs that satisfy a testing criterion. Our implementation for Java uses the Java PathFinder model checker.

  8. Application of SNMP on CATV

    NASA Astrophysics Data System (ADS)

    Huang, Hong-bin; Liu, Wei-ping; Chen, Shun-er; Zheng, Liming

    2005-02-01

    A new type of CATV network management system developed by universal MCU, which supports SNMP, is proposed in this paper. From the point of view in both hardware and software, the function and method of every modules inside the system, which include communications in the physical layer, protocol process, data process, and etc, are analyzed. In our design, the management system takes IP MAN as data transmission channel and every controlled object in the management structure has a SNMP agent. In the SNMP agent developed, there are four function modules, including physical layer communication module, protocol process module, internal data process module and MIB management module. In the paper, the structure and function of every module are designed and demonstrated while the related hardware circuit, software flow as well as the experimental results are tested. Furthermore, by introducing RTOS into the software programming, the universal MCU procedure can conducts such multi-thread management as fast Ethernet controller driving, TCP/IP process, serial port signal monitoring and so on, which greatly improves efficiency of CPU.

  9. Modified naphthalene diimide as a suitable tetraplex DNA ligand: application to cancer diagnosis and anti-cancer drug

    NASA Astrophysics Data System (ADS)

    Takenaka, Shigeori

    2017-07-01

    It is known that naphthalene diimide carrying two substituents binds to DNA duplex with threading intercalation. Naphthalene diimide carrying ferrocene moieties, ferrocenylnaphthalene diimide (FND), formed a stable complex with DNA duplex and an electrochemical gene detection was achieved with current signal generated from FND bound to the DNA duplex between target DNA and DNA probe immobilized electrode. FND couldn't bind to the mismatched and its surrounding region of DNA duplex and thus FND was applied to the precision detection of single nucleotide polymorphisms (SNPs) using the improved discrimination ability between fully matched and mismatched DNA hybrids and multi-electrode chip. Some of FND derivatives bound to telomere DNA tetraplex stronger than to DNA duplex and was applied to cancer diagnosis as a measure of the elongated telomere DNA with telomerase as a suitable maker of cancer. Furthermore, cyclic naphthalene diimides realized the extremely high preference for DNA tetraplex over DNA duplex. Such molecules will open an effective anti-cancer drug based on telomerase specific inhibitor.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Harold C.; Ibanez, Daniel Alejandro

    This report documents the ASC/ATDM Kokkos deliverable "Production Portable Dy- namic Task DAG Capability." This capability enables applications to create and execute a dynamic task DAG ; a collection of heterogeneous computational tasks with a directed acyclic graph (DAG) of "execute after" dependencies where tasks and their dependencies are dynamically created and destroyed as tasks execute. The Kokkos task scheduler executes the dynamic task DAG on the target execution resource; e.g. a multicore CPU, a manycore CPU such as Intel's Knights Landing (KNL), or an NVIDIA GPU. Several major technical challenges had to be addressed during development of Kokkos' Taskmore » DAG capability: (1) portability to a GPU with it's simplified hardware and micro- runtime, (2) thread-scalable memory allocation and deallocation from a bounded pool of memory, (3) thread-scalable scheduler for dynamic task DAG, (4) usability by applications.« less

  11. Application of uniform design to improve dental implant system.

    PubMed

    Cheng, Yung-Chang; Lin, Deng-Huei; Jiang, Cho-Pei

    2015-01-01

    This paper introduces the application of uniform experimental design to improve dental implant systems subjected to dynamic loads. The dynamic micromotion of the Zimmer dental implant system is calculated and illustrated by explicit dynamic finite element analysis. Endogenous and exogenous factors influence the success rate of dental implant systems. Endogenous factors include: bone density, cortical bone thickness and osseointegration. Exogenous factors include: thread pitch, thread depth, diameter of implant neck and body size. A dental implant system with a crest module was selected to simulate micromotion distribution and stress behavior under dynamic loads using conventional and proposed methods. Finally, the design which caused minimum micromotion was chosen as the optimal design model. The micromotion of the improved model is 36.42 μm, with an improvement is 15.34% as compared to the original model.

  12. The effect of thread pattern upon implant osseointegration.

    PubMed

    Abuhussein, Heba; Pagni, Giorgio; Rebaudi, Alberto; Wang, Hom-Lay

    2010-02-01

    Implant design features such as macro- and micro-design may influence overall implant success. Limited information is currently available. Therefore, it is the purpose of this paper to examine these factors such as thread pitch, thread geometry, helix angle, thread depth and width as well as implant crestal module may affect implant stability. A literature search was conducted using MEDLINE to identify studies, from simulated laboratory models, animal, to human, related to this topic using the keywords of implant thread, implant macrodesign, thread pitch, thread geometry, helix angle, thread depth, thread width and implant crestal module. The results showed how thread geometry affects the distribution of stress forces around the implant. A decreased thread pitch may positively influence implant stability. Excess helix angles in spite of a faster insertion may jeopardize the ability of implants to sustain axial load. Deeper threads seem to have an important effect on the stabilization in poorer bone quality situations. The addition of threads or microthreads up to the crestal module of an implant might provide a potential positive contribution on bone-to to-implant contact as well as on the preservation of marginal bone; nonetheless this remains to be determined. Appraising the current literature on this subject and combining existing data to verify the presence of any association between the selected characteristics may be critical in the achievement of overall implant success.

  13. Method for molding threads in graphite panels

    DOEpatents

    Short, W.W.; Spencer, C.

    1994-11-29

    A graphite panel with a hole having a damaged thread is repaired by drilling the hole to remove all of the thread and making a new hole of larger diameter. A bolt with a lubricated thread is placed in the new hole and the hole is packed with graphite cement to fill the hole and the thread on the bolt. The graphite cement is cured, and the bolt is unscrewed therefrom to leave a thread in the cement which is at least as strong as that of the original thread. 8 figures.

  14. Human stem cell decorated nanocellulose threads for biomedical applications.

    PubMed

    Mertaniemi, Henrikki; Escobedo-Lucea, Carmen; Sanz-Garcia, Andres; Gandía, Carolina; Mäkitie, Antti; Partanen, Jouni; Ikkala, Olli; Yliperttula, Marjo

    2016-03-01

    Upon surgery, local inflammatory reactions and postoperative infections cause complications, morbidity, and mortality. Delivery of human adipose mesenchymal stem cells (hASC) into the wounds is an efficient and safe means to reduce inflammation and promote wound healing. However, administration of stem cells by injection often results in low cell retention, and the cells deposit in other organs, reducing the efficiency of the therapy. Thus, it is essential to improve cell delivery to the target area using carriers to which the cells have a high affinity. Moreover, the application of hASC in surgery has typically relied on animal-origin components, which may induce immune reactions or even transmit infections due to pathogens. To solve these issues, we first show that native cellulose nanofibers (nanofibrillated cellulose, NFC) extracted from plants allow preparation of glutaraldehyde cross-linked threads (NFC-X) with high mechanical strength even under the wet cell culture or surgery conditions, characteristically challenging for cellulosic materials. Secondly, using a xenogeneic free protocol for isolation and maintenance of hASC, we demonstrate that cells adhere, migrate and proliferate on the NFC-X, even without surface modifiers. Cross-linked threads were not found to induce toxicity on the cells and, importantly, hASC attached on NFC-X maintained their undifferentiated state and preserved their bioactivity. After intradermal suturing with the hASC decorated NFC-X threads in an ex vivo experiment, cells remained attached to the multifilament sutures without displaying morphological changes or reducing their metabolic activity. Finally, as NFC-X optionally allows facile surface tailoring if needed, we anticipate that stem-cell-decorated NFC-X opens a versatile generic platform as a surgical bionanomaterial for fighting postoperative inflammation and chronic wound healing problems. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Insertion tube methods and apparatus

    DOEpatents

    Casper, William L.; Clark, Don T.; Grover, Blair K.; Mathewson, Rodney O.; Seymour, Craig A.

    2007-02-20

    A drill string comprises a first drill string member having a male end; and a second drill string member having a female end configured to be joined to the male end of the first drill string member, the male end having a threaded portion including generally square threads, the male end having a non-threaded extension portion coaxial with the threaded portion, and the male end further having a bearing surface, the female end having a female threaded portion having corresponding female threads, the female end having a non-threaded extension portion coaxial with the female threaded portion, and the female end having a bearing surface. Installation methods, including methods of installing instrumented probes are also provided.

  16. Subsurface drill string

    DOEpatents

    Casper, William L [Rigby, ID; Clark, Don T [Idaho Falls, ID; Grover, Blair K [Idaho Falls, ID; Mathewson, Rodney O [Idaho Falls, ID; Seymour, Craig A [Idaho Falls, ID

    2008-10-07

    A drill string comprises a first drill string member having a male end; and a second drill string member having a female end configured to be joined to the male end of the first drill string member, the male end having a threaded portion including generally square threads, the male end having a non-threaded extension portion coaxial with the threaded portion, and the male end further having a bearing surface, the female end having a female threaded portion having corresponding female threads, the female end having a non-threaded extension portion coaxial with the female threaded portion, and the female end having a bearing surface. Installation methods, including methods of installing instrumented probes are also provided.

  17. Thread-Lift Sutures: Still in the Lift? A Systematic Review of the Literature.

    PubMed

    Gülbitti, Haydar Aslan; Colebunders, Britt; Pirayesh, Ali; Bertossi, Dario; van der Lei, Berend

    2018-03-01

    In 2006, Villa et al. published a review article concerning the use of thread-lift sutures and concluded that the technique was still in its infancy but had great potential to become a useful and effective procedure for nonsurgical lifting of sagged facial tissues. As 11 years have passed, the authors now performed again a systematic review to determine the real scientific current state of the art on the use of thread-lift sutures. A systematic review was performed according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines using the PubMed database and using the Medical Subject Headings search term "Rhytidoplasty." "Rhytidoplasty" and the following entry terms were included by this Medical Subject Headings term: "facelift," "facelifts," "face Lift," "Face Lifts," "Lift," "Face," "Lifts," "Platysmotomy," "Platysmotomies," "Rhytidectomy," "Rhytidectomies," "Platysmaplasty," "and "Platysmaplasties." The Medical Subject Headings term "Rhytidoplasty" was combined with the following search terms: "Barbed suture," "Thread lift," "APTOS," "Suture suspension," "Percutaneous," and "Silhouette suture." RefWorks was used to filter duplicates. Three of the authors (H.A.G., B.C., and B.L.) performed the search independently. The initial search with all search terms resulted in 188 articles. After filtering the duplicates and the articles about open procedures, a total of 41 articles remained. Of these, the review articles, case reports, and letters to the editor were subsequently excluded, as were reports dealing with nonbarbed sutures, such as Vicryl and Prolene with Gore-Tex. This resulted in a total of 12 articles, seven additional articles since the five articles reviewed by Villa et al. The authors' review demonstrated that, within the past decade, little or no substantial evidence has been added to the peer-reviewed literature to support or sustain the promising statement about thread-lift sutures as made by Villa et al. in 2006 in terms of efficacy or safety. All included literature in the authors' review, except two studies, demonstrated at best a very limited durability of the lifting effect. The two positive studies were sponsored by the companies that manufacture the thread-lift sutures.

  18. System, methods and apparatus for program optimization for multi-threaded processor architectures

    DOEpatents

    Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E

    2015-01-06

    Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

  19. Runtime Performance and Virtual Network Control Alternatives in VM-Based High-Fidelity Network Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoginath, Srikanth B; Perumalla, Kalyan S; Henz, Brian J

    2012-01-01

    In prior work (Yoginath and Perumalla, 2011; Yoginath, Perumalla and Henz, 2012), the motivation, challenges and issues were articulated in favor of virtual time ordering of Virtual Machines (VMs) in network simulations hosted on multi-core machines. Two major components in the overall virtualization challenge are (1) virtual timeline establishment and scheduling of VMs, and (2) virtualization of inter-VM communication. Here, we extend prior work by presenting scaling results for the first component, with experiment results on up to 128 VMs scheduled in virtual time order on a single 12-core host. We also explore the solution space of design alternatives formore » the second component, and present performance results from a multi-threaded, multi-queue implementation of inter-VM network control for synchronized execution with VM scheduling, incorporated in our NetWarp simulation system.« less

  20. Distinct spinning patterns gain differentiated loading tolerance of silk thread anchorages in spiders with different ecology.

    PubMed

    Wolff, Jonas O; van der Meijden, Arie; Herberstein, Marie E

    2017-07-26

    Building behaviour in animals extends biological functions beyond bodies. Many studies have emphasized the role of behavioural programmes, physiology and extrinsic factors for the structure and function of buildings. Structure attachments associated with animal constructions offer yet unrealized research opportunities. Spiders build a variety of one- to three-dimensional structures from silk fibres. The evolution of economic web shapes as a key for ecological success in spiders has been related to the emergence of high performance silks and thread coating glues. However, the role of thread anchorages has been widely neglected in those models. Here, we show that orb-web (Araneidae) and hunting spiders (Sparassidae) use different silk application patterns that determine the structure and robustness of the joint in silk thread anchorages. Silk anchorages of orb-web spiders show a greater robustness against different loading situations, whereas the silk anchorages of hunting spiders have their highest pull-off resistance when loaded parallel to the substrate along the direction of dragline spinning. This suggests that the behavioural 'printing' of silk into attachment discs along with spinneret morphology was a prerequisite for the evolution of extended silk use in a three-dimensional space. This highlights the ecological role of attachments in the evolution of animal architectures. © 2017 The Author(s).

  1. Online Mapping and Perception Algorithms for Multi-robot Teams Operating in Urban Environments

    DTIC Science & Technology

    2015-01-01

    each method on a 2.53 GHz Intel i5 laptop. All our algorithms are hand-optimized, implemented in Java and single threaded. To determine which algorithm...approach would be to label all the pixels in the image with an x, y, z point. However, the angular resolution of the camera is finer than that of the...edge criterion. That is, each edge is either present or absent. In [42], edge existence is further screened by a fixed threshold for angular

  2. Performance of VPIC on Sequoia

    NASA Astrophysics Data System (ADS)

    Nystrom, William

    2014-10-01

    Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.

  3. Towards an agent-oriented programming language based on Scala

    NASA Astrophysics Data System (ADS)

    Mitrović, Dejan; Ivanović, Mirjana; Budimac, Zoran

    2012-09-01

    Scala and its multi-threaded model based on actors represent an excellent framework for developing purely reactive agents. This paper presents an early research on extending Scala with declarative programming constructs, which would result in a new agent-oriented programming language suitable for developing more advanced, BDI agent architectures. The main advantage the new language over many other existing solutions for programming BDI agents is a natural and straightforward integration of imperative and declarative programming constructs, fitted under a single development framework.

  4. Improved Screw-Thread Lock

    NASA Technical Reports Server (NTRS)

    Macmartin, Malcolm

    1995-01-01

    Improved screw-thread lock engaged after screw tightened in nut or other mating threaded part. Device does not release contaminating material during tightening of screw. Includes pellet of soft material encased in screw and retained by pin. Hammer blow on pin extrudes pellet into slot, engaging threads in threaded hole or in nut.

  5. Method for molding threads in graphite panels

    DOEpatents

    Short, William W.; Spencer, Cecil

    1994-01-01

    A graphite panel (10) with a hole (11) having a damaged thread (12) is repaired by drilling the hole (11) to remove all of the thread and make a new hole (13) of larger diameter. A bolt (14) with a lubricated thread (17) is placed in the new hole (13) and the hole (13) is packed with graphite cement (16) to fill the hole and the thread on the bolt. The graphite cement (16) is cured, and the bolt is unscrewed therefrom to leave a thread (20) in the cement (16) which is at least as strong as that of the original thread (12).

  6. Virtual suturing simulation based on commodity physics engine for medical learning.

    PubMed

    Choi, Kup-Sze; Chan, Sze-Ho; Pang, Wai-Man

    2012-06-01

    Development of virtual-reality medical applications is usually a complicated and labour intensive task. This paper explores the feasibility of using commodity physics engine to develop a suturing simulator prototype for manual skills training in the fields of nursing and medicine, so as to enjoy the benefits of rapid development and hardware-accelerated computation. In the prototype, spring-connected boxes of finite dimension are used to simulate soft tissues, whereas needle and thread are modelled with chained segments. Spherical joints are used to simulate suture's flexibility and to facilitate thread cutting. An algorithm is developed to simulate needle insertion and thread advancement through the tissue. Two-handed manipulations and force feedback are enabled with two haptic devices. Experiments on the closure of a wound show that the prototype is able to simulate suturing procedures at interactive rates. The simulator is also used to study a curvature-adaptive suture modelling technique. Issues and limitations of the proposed approach and future development are discussed.

  7. CNTs threaded (001) exposed TiO2 with high activity in photocatalytic NO oxidation.

    PubMed

    Xiao, Shuning; Zhu, Wei; Liu, Peijue; Liu, Fanfan; Dai, Wenrui; Zhang, Dieqing; Chen, Wei; Li, Hexing

    2016-02-07

    A microwave-ionothermal strategy was developed for in situ synthesis of CNTs threaded TiO2 single crystal with a tunable percentage of surface exposed (001) active facets. The CNTs were used as microwave antennas to create local "super hot" dots to induce Ti(3+) adsorption and hydrolysis, thereby leading to a good assembly of (001) facets exposed single crystalline TiO2 threaded by the CNTs in the presence of Hmim[BF4] ionic liquid. Due to the high percentage of the active (001) facets of single crystal TiO2 and the direct electron transfer property of the CNTs, the as-prepared CNTs-TiO2 composite showed a photocatalytic NO removal ratio of up to 76.8% under UV irradiation. In addition, with self-doped Ti(3+), the CNTs-TiO2 composite also exhibited an enhanced activity under irradiation with either solar lights or visible lights, showing good potential in practical applications for environmental remediation.

  8. Quick application/release nut with engagement indicator

    NASA Technical Reports Server (NTRS)

    Wright, Jay M. (Inventor)

    1992-01-01

    A composite nut is shown which permits a fastener to be inserted or removed from either side with an indicator of fastener engagement. The nut has a plurality of segments, preferably at least three segments, which are internally threaded, spring loaded apart by an internal spring, and has detents on opposite sides which force the nut segments into operative engagements with a threaded member when pushed in and release the segments for quick insertion or removal of the nut when moved out. When the nut is installed, end pressure on one of the detents presses the nut segments into operative engagement with a threaded member where continued rotation locks the structure together with the detents depressed to indicate positive locking engagement of the nut. On removal, counterclockwise rotation of the nut relieves the endwise pressure on the detents, permitting internal springs to force the detents outward and allowing the nut segments to move outward and separate to permit quick removal of the fastener.

  9. External Tank (ET) Bipod Fitting Bolted Attachment Locking Insert Performance

    NASA Technical Reports Server (NTRS)

    Larsen, Curtis E.; Wilson, Tim R.; Elliott, Kenny B.; Raju, Ivatury S.; McManamen, John

    2008-01-01

    Following STS-107, the External Tank (ET) Project implemented corrective actions and configuration changes at the ET bipod fitting. Among the corrective actions, the existing bolt lock wire which provided resistance to potential bolt rotation was removed. The lock wire removal was because of concerns with creating voids during foam application and potential for lock wire to become debris. The bolts had been previously lubricated to facilitate assembly but, because of elimination of the lock wire, the ET Project wanted to enable the locking feature of the insert. Thus, the lubrication was removed from bolt threads and instead applied to the washer under the bolt head. Lubrication is necessary to maximize joint pre-load while remaining within the bolt torque specification. The locking feature is implemented by thread crimping in at four places in the insert. As the bolt is torqued into the insert the bolt threads its way past the crimped parts of the insert. This provides the locking of the bolt, as torque is required to loosen the joint after clamping.

  10. Archer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh

    2017-01-12

    Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.

  11. Method for Estimating Thread Strength Reduction of Damaged Parent Holes with Inserts

    NASA Technical Reports Server (NTRS)

    Johnson, David L.; Stratton, Troy C.

    2005-01-01

    During normal assembly and disassembly of bolted-joint components, thread damage and/or deformation may occur. If threads are overloaded, thread damage/deformation can also be anticipated. Typical inspection techniques (e.g. using GO-NO GO gages) may not provide adequate visibility of the extent of thread damage. More detailed inspection techniques have provided actual pitch-diameter profiles of damaged-hardware holes. A method to predict the reduction in thread shear-out capacity of damaged threaded holes has been developed. This method was based on testing and analytical modeling. Test samples were machined to simulate damaged holes in the hardware of interest. Test samples containing pristine parent-holes were also manufactured from the same bar-stock material to provide baseline results for comparison purposes. After the particular parent-hole thread profile was machined into each sample a helical insert was installed into the threaded hole. These samples were tested in a specially designed fixture to determine the maximum load required to shear out the parent threads. It was determined from the pristine-hole samples that, for the specific material tested, each individual thread could resist an average load of 3980 pounds. The shear-out loads of the holes having modified pitch diameters were compared to the ultimate loads of the specimens with pristine holes. An equivalent number of missing helical coil threads was then determined based on the ratio of shear-out loads for each thread configuration. These data were compared with the results from a finite element model (FEM). The model gave insights into the ability of the thread loads to redistribute for both pristine and simulated damage configurations. In this case, it was determined that the overall potential reduction in thread load-carrying capability in the hardware of interest was equal to having up to three fewer threads in the hole that bolt threads could engage. One- half of this potential reduction was due to local pitch-diameter variations and the other half was due to overall pitch-diameter enlargement beyond Class 2 fit. This result was important in that the thread shear capacity for this particular hardware design was the limiting structural capability. The details of the method development, including the supporting testing, data reduction and analytical model results comparison will be discussed hereafter.

  12. CMS Readiness for Multi-Core Workload Scheduling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.

    In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides amore » solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.« less

  13. CMS readiness for multi-core workload scheduling

    NASA Astrophysics Data System (ADS)

    Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.

    2017-10-01

    In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.

  14. The research and development of the non-contact detection of the tubing internal thread with a line structured light

    NASA Astrophysics Data System (ADS)

    Hu, Yuanyuan; Xu, Yingying; Hao, Qun; Hu, Yao

    2013-12-01

    The tubing internal thread plays an irreplaceable role in the petroleum equipment. The unqualified tubing can directly lead to leakage, slippage and bring huge losses for oil industry. For the purpose of improving efficiency and precision of tubing internal thread detection, we develop a new non-contact tubing internal thread measurement system based on the laser triangulation principle. Firstly, considering that the tubing thread had a small diameter and relatively smooth surface, we built a set of optical system with a line structured light to irradiate the internal thread surface and obtain an image which contains the internal thread profile information through photoelectric sensor. Secondly, image processing techniques were used to do the edge detection of the internal thread from the obtained image. One key method was the sub-pixel technique which greatly improved the detection accuracy under the same hardware conditions. Finally, we restored the real internal thread contour information on the basis of laser triangulation method and calculated tubing thread parameters such as the pitch, taper and tooth type angle. In this system, the profile of several thread teeth can be obtained at the same time. Compared with other existing scanning methods using point light and stepper motor, this system greatly improves the detection efficiency. Experiment results indicate that this system can achieve the high precision and non-contact measurement of the tubing internal thread.

  15. Measurement of Sound Speed in Thread

    NASA Astrophysics Data System (ADS)

    Saito, Shigemi; Shibata, Yasuhiro; Ichiki, Akira; Miyazaki, Akiho

    2006-05-01

    By employing thin wires, human hairs and threads, the measurement of sound speed in a thread whose diameter is smaller than 0.2 mm has been attempted. Preparing two cylindrical ceramic transducers with a 300 kHz resonance frequency, a perforated glass bead to be knotted by a sample thread is bonded to the center of the end surface of each transducer. After connecting these transducers with a sample thread, a receiving transducer is attached at a ceiling so as to hang another transmitting transducer with the thread. A glass bead is bonded to another end surface of the transmitting transducer so that tension, varied with a hanged plumb, can be applied to the sample thread. The time delay of the received signal relative to the transmitting pulse is measured while gradually shortening the thread. Sound speed is determined by the proportionality of time delay with thread length. Although the measured values for metallic wires are somewhat different from the values derived from the density and Young’s modulus cited in references, they are reproducible. The sound speed for human hairs of over twenty samples, which varies between 2000 and 2500 m/s, seems to depend on hair quality. Sound speed in a cotton thread is found to approach a constant value under large tension. An advanced measurement system available for uncut threads is also presented, where semi cylindrical transducers pinch the thread.

  16. 78 FR 79670 - Steel Threaded Rod From Thailand: Preliminary Determination of Sales at Less Than Fair Value and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-31

    ... DEPARTMENT OF COMMERCE International Trade Administration [A-549-831] Steel Threaded Rod From... ``Department'') preliminarily determines that steel threaded rod from Thailand is being, or is likely to be... Investigation The merchandise covered by this investigation is steel threaded rod. Steel threaded rod is certain...

  17. 49 CFR 178.46 - Specification 3AL seamless aluminum cylinders.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... circular. (5) All openings must be threaded. Threads must comply with the following: (i) Each thread must be clean cut, even, without checks, and to gauge. (ii) Taper threads, when used, must conform to one of the following: (A) American Standard Pipe Thread (NPT) type, conforming to the requirements of NBS...

  18. 49 CFR 178.46 - Specification 3AL seamless aluminum cylinders.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... circular. (5) All openings must be threaded. Threads must comply with the following: (i) Each thread must be clean cut, even, without checks, and to gauge. (ii) Taper threads, when used, must conform to one of the following: (A) American Standard Pipe Thread (NPT) type, conforming to the requirements of NBS...

  19. 49 CFR 178.46 - Specification 3AL seamless aluminum cylinders.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... circular. (5) All openings must be threaded. Threads must comply with the following: (i) Each thread must be clean cut, even, without checks, and to gauge. (ii) Taper threads, when used, must conform to one of the following: (A) American Standard Pipe Thread (NPT) type, conforming to the requirements of NBS...

  20. 49 CFR 178.46 - Specification 3AL seamless aluminum cylinders.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... circular. (5) All openings must be threaded. Threads must comply with the following: (i) Each thread must be clean cut, even, without checks, and to gauge. (ii) Taper threads, when used, must conform to one of the following: (A) American Standard Pipe Thread (NPT) type, conforming to the requirements of NBS...

  1. Application and Mechanics Analysis of Multi-Function Construction Platforms in Prefabricated-Concrete Construction

    NASA Astrophysics Data System (ADS)

    Wang, Meihua; Li, Rongshuai; Zhang, Wenze

    2017-11-01

    Multi-function construction platforms (MCPs) as an “old construction technology, new application” of the building facade construction equipment, its efforts to reduce labour intensity, improve labour productivity, ensure construction safety, shorten the duration of construction and other aspects of the effect are significant. In this study, the functional analysis of the multi-function construction platforms is carried out in the construction of the assembly building. Based on the general finite element software ANSYS, the static calculation and dynamic characteristics analysis of the MCPs structure are analysed, the simplified finite element model is constructed, and the selection of the unit, the processing and solution of boundary are under discussion and research. The maximum deformation value, the maximum stress value and the structural dynamic characteristic model are obtained. The dangerous parts of the platform structure are analysed, too. Multiple types of MCPs under engineering construction conditions are calculated, so as to put forward the rationalization suggestions for engineering application of the MCPs.

  2. Cribellate thread production in spiders: Complex processing of nano-fibres into a functional capture thread.

    PubMed

    Joel, Anna-Christin; Kappel, Peter; Adamova, Hana; Baumgartner, Werner; Scholz, Ingo

    2015-11-01

    Spider silk production has been studied intensively in the last years. However, capture threads of cribellate spiders employ an until now often unnoticed alternative of thread production. This thread in general is highly interesting, as it not only involves a controlled arrangement of three types of threads with one being nano-scale fibres (cribellate fibres), but also a special comb-like structure on the metatarsus of the fourth leg (calamistrum) for its production. We found the cribellate fibres organized as a mat, enclosing two parallel larger fibres (axial fibres) and forming the typical puffy structure of cribellate threads. Mat and axial fibres are punctiform connected to each other between two puffs, presumably by the action of the median spinnerets. However, this connection alone does not lead to the typical puffy shape of a cribellate thread. Removing the calamistrum, we found a functional capture thread still being produced, but the puffy shape of the thread was lost. Therefore, the calamistrum is not necessary for the extraction or combination of fibres, but for further processing of the nano-scale cribellate fibres. Using data from Uloborus plumipes we were able to develop a model of the cribellate thread production, probably universally valid for cribellate spiders. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design

    NASA Astrophysics Data System (ADS)

    Debes, Eric; Kaine, Greg

    2002-11-01

    In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.

  4. Face Gear Drive with Spur Involute Pinion: Geometry, Generation by a Worm, Stress Analysis

    NASA Technical Reports Server (NTRS)

    Litvin, Faydor L.; Fuentes, Alfonso; Zanzi, Claudio; Pontiggia, Matteo; Handschuh, Robert F. (Technical Monitor)

    2002-01-01

    A face gear drive with a spur involute pinion is considered. The generation of the face gear is based on application of a grinding or cutting worm whereas the conventional method of generation is based on application of an involute shaper. An analytical approach is proposed for the determination of: (1) the worm thread surface; (2) avoidance of singularities of the worm thread surface, (air) dressing of the worm; and (3) determination of stresses of the face-gear drive. A computer program for simulation of meshing and contact of the pinion and face-gear has been developed. Correction of machine-tool settings is proposed for reduction of the shift of the bearing contact caused by misalignment. An automatic development of the model of five contacting teeth has been proposed for stress analysis. Numerical examples for illustration of the developed theory are provided.

  5. PICH and BLM limit histone association with anaphase centromeric DNA threads and promote their resolution

    PubMed Central

    Ke, Yuwen; Huh, Jae-Wan; Warrington, Ross; Li, Bing; Wu, Nan; Leng, Mei; Zhang, Junmei; Ball, Haydn L; Li, Bing; Yu, Hongtao

    2011-01-01

    Centromeres nucleate the formation of kinetochores and are vital for chromosome segregation during mitosis. The SNF2 family helicase PICH (Plk1-interacting checkpoint helicase) and the BLM (the Bloom's syndrome protein) helicase decorate ultrafine histone-negative DNA threads that link the segregating sister centromeres during anaphase. The functions of PICH and BLM at these threads are not understood, however. Here, we show that PICH binds to BLM and enables BLM localization to anaphase centromeric threads. PICH- or BLM-RNAi cells fail to resolve these threads in anaphase. The fragmented threads form centromeric-chromatin-containing micronuclei in daughter cells. Anaphase threads in PICH- and BLM-RNAi cells contain histones and centromere markers. Recombinant purified PICH has nucleosome remodelling activities in vitro. We propose that PICH and BLM unravel centromeric chromatin and keep anaphase DNA threads mostly free of nucleosomes, thus allowing these threads to span long distances between rapidly segregating centromeres without breakage and providing a spatiotemporal window for their resolution. PMID:21743438

  6. Effect of Thread and Rotating Speed on Material Flow Behavior and Mechanical Properties of Friction Stir Lap Welding Joints

    NASA Astrophysics Data System (ADS)

    Ji, Shude; Li, Zhengwei; Zhou, Zhenlu; Wu, Baosheng

    2017-10-01

    This study focused on the effects of thread on hook and cold lap formation, lap shear property and impact toughness of alclad 2024-T4 friction stir lap welding (FSLW) joints. Except the traditional threaded pin tool (TR-tool), three new tools with different thread locations and orientations were designed. Results showed that thread significantly affected hook, cold lap morphologies and lap shear properties. The tool with tip-threaded pin (T-tool) fabricated joint with flat hook and cold lap, which resulted in shear fracture mode. The tools with bottom-threaded pin (B-tool) eliminated the hook. The tool with reverse-threaded pin (R-tool) widened the stir zone width. When using configuration A, the joints fabricated by the three new tools showed higher failure loads than the joint fabricated by the TR-tool. The joint using the T-tool owned the optimum impact toughness. This study demonstrated the significance of thread during FSLW and provided a reference to optimize tool geometry.

  7. Implementation of 5-layer thermal diffusion scheme in weather research and forecasting model with Intel Many Integrated Cores

    NASA Astrophysics Data System (ADS)

    Huang, Melin; Huang, Bormin; Huang, Allen H.

    2014-10-01

    For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

  8. Low-voltage cross-sectional EBIC for characterisation of GaN-based light emitting devices.

    PubMed

    Moldovan, Grigore; Kazemian, Payam; Edwards, Paul R; Ong, Vincent K S; Kurniawan, Oka; Humphreys, Colin J

    2007-01-01

    Electron beam induced current (EBIC) characterisation can provide detailed information on the influence of crystalline defects on the diffusion and recombination of minority carriers in semiconductors. New developments are required for GaN light emitting devices, which need a cross-sectional approach to provide access to their complex multi-layered structures. A sample preparation approach based on low-voltage Ar ion milling is proposed here and shown to produce a flat cross-section with very limited surface recombination, which enables low-voltage high resolution EBIC characterisation. Dark defects are observed in EBIC images and correlation with cathodoluminescence images identify them as threading dislocations. Emphasis is placed on one-dimensional quantification which is used to show that junction delineation with very good spatial resolution can be achieved, revealing significant roughening of this GaN p-n junction. Furthermore, longer minority carrier diffusion lengths along the c-axis are found at dislocation sites, in both p-GaN and the multi-quantum well (MQW) region. This is attributed to gettering of point defects at threading dislocations in p-GaN and higher escape rate from quantum wells at dislocation sites in the MQW region, respectively. These developments show considerable promise for the use of low-voltage cross-sectional EBIC in the characterisation of point and extended defects in GaN-based devices and it is suggested that this technique will be particularly useful for degradation analysis.

  9. Structural Turnbuckle Bears Compressive or Tensile Loads

    NASA Technical Reports Server (NTRS)

    Bateman, W. A.; Lang, C. H.

    1985-01-01

    Column length adjuster based on turnbuckle principle. Device consists of internally and externally threaded bushing, threaded housing and threaded rod. Housing attached to one part and threaded rod attached to other part of structure. Turning double threaded bushing contracts or extends rod in relation to housing. Once adjusted, bushing secured with jamnuts. Device used for axially loaded members requiring length adjustment during installation.

  10. Do dual-thread orthodontic mini-implants improve bone/tissue mechanical retention?

    PubMed

    Lin, Yang-Sung; Chang, Yau-Zen; Yu, Jian-Hong; Lin, Chun-Li

    2014-12-01

    The aim of this study was to understand whether the pitch relationship between micro and macro thread designs with a parametrical relationship in a dual-thread mini-implant can improve primary stability. Three types of mini-implants consisting of single-thread (ST) (0.75 mm pitch in whole length), dual-thread A (DTA) with double-start 0.375 mm pitch, and dual-thread B (DTB) with single-start 0.2 mm pitch in upper 2-mm micro thread region for performing insertion and pull-out testing. Histomorphometric analysis was performed in these specimens in evaluating peri-implant bone defects using a non-contact vision measuring system. The maximum inserted torque (Tmax) in type DTA was found to be the smallest significantly, but corresponding values found no significant difference between ST and DTB. The largest pull-out strength (Fmax) in the DTA mini-implant was found significantly greater than that for the ST mini-implant regardless of implant insertion orientation. Mini-implant engaged the cortical bone well as observed in ST and DTA types. Dual-thread mini-implant with correct micro thread pitch (parametrical relationship with macro thread pitch) in the cortical bone region can improve primary stability and enhanced mechanical retention.

  11. Three-dimensional optimization and sensitivity analysis of dental implant thread parameters using finite element analysis.

    PubMed

    Geramizadeh, Maryam; Katoozian, Hamidreza; Amid, Reza; Kadkhodazadeh, Mahdi

    2018-04-01

    This study aimed to optimize the thread depth and pitch of a recently designed dental implant to provide uniform stress distribution by means of a response surface optimization method available in finite element (FE) software. The sensitivity of simulation to different mechanical parameters was also evaluated. A three-dimensional model of a tapered dental implant with micro-threads in the upper area and V-shaped threads in the rest of the body was modeled and analyzed using finite element analysis (FEA). An axial load of 100 N was applied to the top of the implants. The model was optimized for thread depth and pitch to determine the optimal stress distribution. In this analysis, micro-threads had 0.25 to 0.3 mm depth and 0.27 to 0.33 mm pitch, and V-shaped threads had 0.405 to 0.495 mm depth and 0.66 to 0.8 mm pitch. The optimized depth and pitch were 0.307 and 0.286 mm for micro-threads and 0.405 and 0.808 mm for V-shaped threads, respectively. In this design, the most effective parameters on stress distribution were the depth and pitch of the micro-threads based on sensitivity analysis results. Based on the results of this study, the optimal implant design has micro-threads with 0.307 and 0.286 mm depth and pitch, respectively, in the upper area and V-shaped threads with 0.405 and 0.808 mm depth and pitch in the rest of the body. These results indicate that micro-thread parameters have a greater effect on stress and strain values.

  12. SEM and fractography analysis of screw thread loosening in dental implants.

    PubMed

    Scarano, A; Quaranta, M; Traini, T; Piattelli, M; Piattelli, A

    2007-01-01

    Biological and technical failures of implants have already been reported. Mechanical factors are certainly of importance in implant failures, even if their exact nature has not yet been established. The abutment screw fracture or loosening represents a rare, but quite unpleasant failure. The aim of the present research is an analysis and structural examination of screw thread or abutment loosening compared with screw threads or abutment without loosening. The loosening of screw threads was compared to screw thread without loosening of three different implant systems; Branemark (Nobel Biocare, Gothenburg, Sweden), T.B.R. implant systems (Benax, Ancona, Italy) and Restore (Lifecore Biomedical, Chaska, Minnesota, USA). In this study broken screws were excluded. A total of 16 screw thread loosenings were observed (Group I) (4 Branemark, 4 T.B.R and 5 Restore), 10 screw threads without loosening were removed (Group II), and 6 screw threads as received by the manufacturer (unused) (Group III) were used as control (2 Branemark, 2 T.B.R and 2 Restore). The loosened abutment screws were retrieved and analyzed under SEM. Many alterations and deformations were present in concavities and convexities of screw threads in group I. No macroscopic alterations or deformations were observed in groups II and III. A statistical difference of the presence of microcracks were observed between screw threads with an abutment loosening and screw threads without an abutment loosening.

  13. Biomechanical investigation of a novel ratcheting arthrodesis nail.

    PubMed

    McCormick, Jeremy J; Li, Xinning; Weiss, Douglas R; Billiar, Kristen L; Wixted, John J

    2010-10-14

    Knee or tibiotalocalcaneal arthrodesis is a salvage procedure, often with unacceptable rates of nonunion. Basic science of fracture healing suggests that compression across a fusion site may decrease nonunion. A novel ratcheting arthrodesis nail designed to improve dynamic compression is mechanically tested in comparison to existing nails. A novel ratcheting nail was designed and mechanically tested in comparison to a solid nail and a threaded nail using sawbones models (Pacific Research Laboratories, Inc.). Intramedullary nails (IM) were implanted with a load cell (Futek LTH 500) between fusion surfaces. Constructs were then placed into a servo-hydraulic test frame (Model 858 Mini-bionix, MTS Systems) for application of 3 mm and 6 mm dynamic axial displacement (n = 3/group). Load to failure was also measured. Mean percent of initial load after 3-mm and 6-mm displacement was 190.4% and 186.0% for the solid nail, 80.7% and 63.0% for the threaded nail, and 286.4% and 829.0% for the ratcheting nail, respectively. Stress-shielding (as percentage of maximum load per test) after 3-mm and 6-mm displacement averaged 34.8% and 28.7% (solid nail), 40.3% and 40.9% (threaded nail), and 18.5% and 11.5% (ratcheting nail), respectively. In the 6-mm trials, statistically significant increase in initial load and decrease in stress-shielding for the ratcheting vs. solid nail (p = 0.029, p = 0.001) and vs. threaded nail (p = 0.012, p = 0.002) was observed. Load to failure for the ratcheting nail; 599.0 lbs, threaded nail; 508.8 lbs, and solid nail; 688.1 lbs. With significantly increase of compressive load while decreasing stress-shielding at 6-mm of dynamic displacement, the ratcheting mechanism in IM nails may clinically improve rates of fusion.

  14. Biomechanical investigation of a novel ratcheting arthrodesis nail

    PubMed Central

    2010-01-01

    Background Knee or tibiotalocalcaneal arthrodesis is a salvage procedure, often with unacceptable rates of nonunion. Basic science of fracture healing suggests that compression across a fusion site may decrease nonunion. A novel ratcheting arthrodesis nail designed to improve dynamic compression is mechanically tested in comparison to existing nails. Methods A novel ratcheting nail was designed and mechanically tested in comparison to a solid nail and a threaded nail using sawbones models (Pacific Research Laboratories, Inc.). Intramedullary nails (IM) were implanted with a load cell (Futek LTH 500) between fusion surfaces. Constructs were then placed into a servo-hydraulic test frame (Model 858 Mini-bionix, MTS Systems) for application of 3 mm and 6 mm dynamic axial displacement (n = 3/group). Load to failure was also measured. Results Mean percent of initial load after 3-mm and 6-mm displacement was 190.4% and 186.0% for the solid nail, 80.7% and 63.0% for the threaded nail, and 286.4% and 829.0% for the ratcheting nail, respectively. Stress-shielding (as percentage of maximum load per test) after 3-mm and 6-mm displacement averaged 34.8% and 28.7% (solid nail), 40.3% and 40.9% (threaded nail), and 18.5% and 11.5% (ratcheting nail), respectively. In the 6-mm trials, statistically significant increase in initial load and decrease in stress-shielding for the ratcheting vs. solid nail (p = 0.029, p = 0.001) and vs. threaded nail (p = 0.012, p = 0.002) was observed. Load to failure for the ratcheting nail; 599.0 lbs, threaded nail; 508.8 lbs, and solid nail; 688.1 lbs. Conclusion With significantly increase of compressive load while decreasing stress-shielding at 6-mm of dynamic displacement, the ratcheting mechanism in IM nails may clinically improve rates of fusion. PMID:20942976

  15. Initial Kernel Timing Using a Simple PIM Performance Model

    NASA Technical Reports Server (NTRS)

    Katz, Daniel S.; Block, Gary L.; Springer, Paul L.; Sterling, Thomas; Brockman, Jay B.; Callahan, David

    2005-01-01

    This presentation will describe some initial results of paper-and-pencil studies of 4 or 5 application kernels applied to a processor-in-memory (PIM) system roughly similar to the Cascade Lightweight Processor (LWP). The application kernels are: * Linked list traversal * Sun of leaf nodes on a tree * Bitonic sort * Vector sum * Gaussian elimination The intent of this work is to guide and validate work on the Cascade project in the areas of compilers, simulators, and languages. We will first discuss the generic PIM structure. Then, we will explain the concepts needed to program a parallel PIM system (locality, threads, parcels). Next, we will present a simple PIM performance model that will be used in the remainder of the presentation. For each kernel, we will then present a set of codes, including codes for a single PIM node, and codes for multiple PIM nodes that move data to threads and move threads to data. These codes are written at a fairly low level, between assembly and C, but much closer to C than to assembly. For each code, we will present some hand-drafted timing forecasts, based on the simple PIM performance model. Finally, we will conclude by discussing what we have learned from this work, including what programming styles seem to work best, from the point-of-view of both expressiveness and performance.

  16. A Moiré Pattern-Based Thread Counter

    ERIC Educational Resources Information Center

    Reich, Gary

    2017-01-01

    Thread count is a term used in the textile industry as a measure of how closely woven a fabric is. It is usually defined as the sum of the number of warp threads per inch (or cm) and the number of weft threads per inch. (It is sometimes confusingly described as the number of threads per square inch.) In recent years it has also become a subject of…

  17. Does Simultaneous Liposuction Adversely Affect the Outcome of Thread Lifts? A Preliminary Result.

    PubMed

    Lee, Yong Woo; Park, Tae Hwan

    2018-04-11

    Along with advances in thread lift techniques and materials, ancillary procedures such as fat grafting, liposuction, or filler injections have been performed simultaneously. Some surgeons think that these ancillary procedures might affect the aesthetic outcomes of thread lifting possibly due to inadvertent injury to threads or loosening of soft tissue via passing the cannula in the surgical plane of the thread lifts. The purpose of the current study is to determine the effect of such ancillary procedures on the outcome of thread lifts in the human and cadaveric setting. We used human abdominal tissue after abdominoplasty and cadaveric faces. In the abdominal tissue, liposuction parallel to the parallel axis was performed in one area for 5 min. We counted 30 passes when liposuction was performed in one direction. This was repeated as we changed the direction of passages. The plane of thread lifts (dermal vs subcutaneous) and angle between liposuction and thread lifts (parallel vs perpendicular) were differentiated in this abdominal tissue study group. Then, we performed parallel or perpendicular thread lifts using a small slit incision. Using a tensiometer, the maximum holding strength was measured when pulling the thread out of the skin as much as possible. We also used faces of cadavers to prove whether the finding in human abdominal tissue is really valid with corresponding techniques. Our pilot study using abdominal tissue showed that liposuction after thread lifts adversely affects it regardless of the vector of thread lifts. In the cadaveric study, however, liposuction prior to thread lifting does not significantly affect the holding strength of thread lifts. Liposuction or fat grafting in the appropriate layer would not be a hurdle to safely performing simultaneous thread lifts if the target lift tissue is intra-SMAS or just above the SMAS layer. This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

  18. Performance and Scalability of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.

  19. On the utility of threads for data parallel programming

    NASA Technical Reports Server (NTRS)

    Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

    1995-01-01

    Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.

  20. Lack of ubiquitin immunoreactivities at both ends of neuropil threads. Possible bidirectional growth of neuropil threads.

    PubMed

    Iwatsubo, T; Hasegawa, M; Esaki, Y; Ihara, Y

    1992-02-01

    Immunocytochemically, neuropil threads (curly fibers) were investigated in the Alzheimer's disease brain using a confocal laser scanning fluorescence microscope by double labeling with tau/ubiquitin antibodies. Ubiquitin immunoreactivities were found to be lacking at one or both ends in more than 40% of tau-positive threads. Immunoelectron microscopy showed that bundles of paired helical filaments, which constitute neuropil threads, were positive for ubiquitin around their midportions, but often negative at their ends. Since it is reasonable to postulate that tau deposition as paired helical filaments precedes ubiquitination, the aforementioned observation suggests that the ends of the threads are newly formed portions, and thus the threads are often growing bidirectionally in small neuronal processes.

  1. Gold thread implantation promotes hair growth in human and mice

    PubMed Central

    Kim, Jong-Hwan; Cho, Eun-Young; Kwon, Euna; Kim, Woo-Ho; Park, Jin-Sung; Lee, Yong-Soon

    2017-01-01

    Thread-embedding therapy has been widely applied for cosmetic purposes such as wrinkle reduction and skin tightening. Particularly, gold thread was reported to support connective tissue regeneration, but, its role in hair biology remains largely unknown due to lack of investigation. When we implanted gold thread and Happy Lift™ in human patient for facial lifting, we unexpectedly found an increase of hair regrowth in spite of no use of hair growth medications. When embedded into the depilated dorsal skin of mice, gold thread or polyglycolic acid (PGA) thread, similarly to 5% minoxidil, significantly increased the number of hair follicles on day 14 after implantation. And, hair re-growth promotion in the gold threadimplanted mice were significantly higher than that in PGA thread group on day 11 after depilation. In particular, the skin tissue of gold thread-implanted mice showed stronger PCNA staining and higher collagen density compared with control mice. These results indicate that gold thread implantation can be an effective way to promote hair re-growth although further confirmatory study is needed for more information on therapeutic mechanisms and long-term safety. PMID:29399026

  2. FAST copper for broadband access

    NASA Astrophysics Data System (ADS)

    Chiang, Mung; Huang, Jianwei; Cendrillon, Raphael; Tan, Chee Wei; Xu, Dahai

    2006-10-01

    FAST Copper is a multi-year, U.S. NSF funded project that started in 2004, and is jointly pursued by the research groups of Mung Chiang at Princeton University, John Cioffi at Stanford University, and Alexader Fraser at Fraser Research Lab, and in collaboration with several industrial partners including AT&T. The goal of the FAST Copper Project is to provide ubiquitous, 100 Mbps, fiber/DSL broadband access to everyone in the U.S. with a phone line. This goal will be achieved through two threads of research: dynamic and joint optimization of resources in Frequency, Amplitude, Space, and Time (thus the name 'FAST') to overcome the attenuation and crosstalk bottlenecks, and the integration of communication, networking, computation, modeling, and distributed information management and control for the multi-user twisted pair network.

  3. Multi-core and GPU accelerated simulation of a radial star target imaged with equivalent t-number circular and Gaussian pupils

    NASA Astrophysics Data System (ADS)

    Greynolds, Alan W.

    2013-09-01

    Results from the GelOE optical engineering software are presented for the through-focus, monochromatic coherent and polychromatic incoherent imaging of a radial "star" target for equivalent t-number circular and Gaussian pupils. The FFT-based simulations are carried out using OpenMP threading on a multi-core desktop computer, with and without the aid of a many-core NVIDIA GPU accessing its cuFFT library. It is found that a custom FFT optimized for the 12-core host has similar performance to a simply implemented 256-core GPU FFT. A more sophisticated version of the latter but tuned to reduce overhead on a 448-core GPU is 20 to 28 times faster than a basic FFT implementation running on one CPU core.

  4. FNCS: A Framework for Power System and Communication Networks Co-Simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ciraci, Selim; Daily, Jeffrey A.; Fuller, Jason C.

    2014-04-13

    This paper describes the Fenix framework that uses a federated approach for integrating power grid and communication network simulators. Compared existing approaches, Fenix al- lows co-simulation of both transmission and distribution level power grid simulators with the communication network sim- ulator. To reduce the performance overhead of time synchro- nization, Fenix utilizes optimistic synchronization strategies that make speculative decisions about when the simulators are going to exchange messages. GridLAB-D (a distribution simulator), PowerFlow (a transmission simulator), and ns-3 (a telecommunication simulator) are integrated with the frame- work and are used to illustrate the enhanced performance pro- vided by speculative multi-threadingmore » on a smart grid applica- tion. Our speculative multi-threading approach achieved on average 20% improvement over the existing synchronization methods« less

  5. Numerical Analysis of Intra-Cavity and Power-Stream Flow Interaction in Multiple Gas-Turbine Disk-Cavities

    NASA Technical Reports Server (NTRS)

    Athavale, M. M.; Przekwas, A. J.; Hendricks, R. C.; Steinetz, B. M.

    1995-01-01

    A numerical analysis methodology and solutions of the interaction between the power stream and multiply-connected multi-cavity sealed secondary flow fields are presented. Flow solutions for a multi-cavity experimental rig were computed and compared with experimental data of Daniels and Johnson. The flow solutions illustrate the complex coupling between the main-path and the cavity flows as well as outline the flow thread that exists throughout the subplatform multiple cavities and seals. The analysis also shows that the de-coupled solutions on single cavities is inadequate. The present results show trends similar to the T-700 engine data that suggests the changes in the CDP seal altered the flow fields throughout the engine and affected the engine performance.

  6. Big Data Geo-Analytical Tool Development for Spatial Analysis Uncertainty Visualization and Quantification Needs

    NASA Astrophysics Data System (ADS)

    Rose, K.; Bauer, J. R.; Baker, D. V.

    2015-12-01

    As big data computing capabilities are increasingly paired with spatial analytical tools and approaches, there is a need to ensure uncertainty associated with the datasets used in these analyses is adequately incorporated and portrayed in results. Often the products of spatial analyses, big data and otherwise, are developed using discontinuous, sparse, and often point-driven data to represent continuous phenomena. Results from these analyses are generally presented without clear explanations of the uncertainty associated with the interpolated values. The Variable Grid Method (VGM) offers users with a flexible approach designed for application to a variety of analyses where users there is a need to study, evaluate, and analyze spatial trends and patterns while maintaining connection to and communicating the uncertainty in the underlying spatial datasets. The VGM outputs a simultaneous visualization representative of the spatial data analyses and quantification of underlying uncertainties, which can be calculated using data related to sample density, sample variance, interpolation error, uncertainty calculated from multiple simulations. In this presentation we will show how we are utilizing Hadoop to store and perform spatial analysis through the development of custom Spark and MapReduce applications that incorporate ESRI Hadoop libraries. The team will present custom 'Big Data' geospatial applications that run on the Hadoop cluster and integrate with ESRI ArcMap with the team's probabilistic VGM approach. The VGM-Hadoop tool has been specially built as a multi-step MapReduce application running on the Hadoop cluster for the purpose of data reduction. This reduction is accomplished by generating multi-resolution, non-overlapping, attributed topology that is then further processed using ESRI's geostatistical analyst to convey a probabilistic model of a chosen study region. Finally, we will share our approach for implementation of data reduction and topology generation via custom multi-step Hadoop applications, performance benchmarking comparisons, and Hadoop-centric opportunities for greater parallelization of geospatial operations. The presentation includes examples of the approach being applied to a range of subsurface, geospatial studies (e.g. induced seismicity risk).

  7. Scheduler for multiprocessor system switch with selective pairing

    DOEpatents

    Gara, Alan; Gschwind, Michael Karl; Salapura, Valentina

    2015-01-06

    System, method and computer program product for scheduling threads in a multiprocessing system with selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). The method configures the selective pairing facility to use checking provide one highly reliable thread for high-reliability and allocate threads to corresponding processor cores indicating need for hardware checking. The method configures the selective pairing facility to provide multiple independent cores and allocate threads to corresponding processor cores indicating inherent resilience.

  8. The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography.

    PubMed

    Zhang, Bo; Yang, Xiang; Yang, Fei; Yang, Xin; Qin, Chenghu; Han, Dong; Ma, Xibo; Liu, Kai; Tian, Jie

    2010-09-13

    In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.

  9. Parallel Computer System for 3D Visualization Stereo on GPU

    NASA Astrophysics Data System (ADS)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  10. Adaptive and mobile ground sensor array.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, Michael Warren; O'Rourke, William T.; Zenner, Jennifer

    The goal of this LDRD was to demonstrate the use of robotic vehicles for deploying and autonomously reconfiguring seismic and acoustic sensor arrays with high (centimeter) accuracy to obtain enhancement of our capability to locate and characterize remote targets. The capability to accurately place sensors and then retrieve and reconfigure them allows sensors to be placed in phased arrays in an initial monitoring configuration and then to be reconfigured in an array tuned to the specific frequencies and directions of the selected target. This report reviews the findings and accomplishments achieved during this three-year project. This project successfully demonstrated autonomousmore » deployment and retrieval of a payload package with an accuracy of a few centimeters using differential global positioning system (GPS) signals. It developed an autonomous, multisensor, temporally aligned, radio-frequency communication and signal processing capability, and an array optimization algorithm, which was implemented on a digital signal processor (DSP). Additionally, the project converted the existing single-threaded, monolithic robotic vehicle control code into a multi-threaded, modular control architecture that enhances the reuse of control code in future projects.« less

  11. Threaded biliary inside stents are a safe and effective therapeutic option in cases of malignant hilar obstruction.

    PubMed

    Inatomi, Osamu; Bamba, Shigeki; Shioya, Makoto; Mochizuki, Yosuke; Ban, Hiromitsu; Tsujikawa, Tomoyuki; Saito, Yasuharu; Andoh, Akira; Fujiyama, Yoshihide

    2013-02-14

    Although endoscopic biliary stents have been accepted as part of palliative therapy for cases of malignant hilar obstruction, the optimal endoscopic management regime remains controversial. In this study, we evaluated the safety and efficacy of placing a threaded stent above the sphincter of Oddi (threaded inside plastic stents, threaded PS) and compared the results with those of other stent types. Patients with malignant hilar obstruction, including those requiring biliary drainage for stent occlusion, were selected. Patients received either one of the following endoscopic indwelling stents: threaded PS, conventional plastic stents (conventional PS), or metallic stents (MS). Duration of stent patency and the incident of complication were compared in these patients. Forty-two patients underwent placement of endoscopic indwelling stents (threaded PS = 12, conventional PS = 17, MS = 13). The median duration of threaded PS patency was significantly longer than that of conventional PS patency (142 vs. 32 days; P = 0.04, logrank test). The median duration of threaded PS and MS patency was not significantly different (142 vs. 150 days, P = 0.83). Stent migration did not occur in any group. Among patients who underwent threaded PS placement as a salvage therapy after MS obstruction due to tumor ingrowth, the median duration of MS patency was significantly shorter than that of threaded PS patency (123 vs. 240 days). Threaded PS are safe and effective in cases of malignant hilar obstruction; moreover, it is a suitable therapeutic option not only for initial drainage but also for salvage therapy.

  12. Microdissection of black widow spider silk-producing glands.

    PubMed

    Jeffery, Felicia; La Mattina, Coby; Tuton-Blasingame, Tiffany; Hsia, Yang; Gnesa, Eric; Zhao, Liang; Franz, Andreas; Vierra, Craig

    2011-01-11

    Modern spiders spin high-performance silk fibers with a broad range of biological functions, including locomotion, prey capture and protection of developing offspring. Spiders accomplish these tasks by spinning several distinct fiber types that have diverse mechanical properties. Such specialization of fiber types has occurred through the evolution of different silk-producing glands, which function as small biofactories. These biofactories manufacture and store large quantities of silk proteins for fiber production. Through a complex series of biochemical events, these silk proteins are converted from a liquid into a solid material upon extrusion. Mechanical studies have demonstrated that spider silks are stronger than high-tensile steel. Analyses to understand the relationship between the structure and function of spider silk threads have revealed that spider silk consists largely of proteins, or fibroins, that have block repeats within their protein sequences. Common molecular signatures that contribute to the incredible tensile strength and extensibility of spider silks are being unraveled through the analyses of translated silk cDNAs. Given the extraordinary material properties of spider silks, research labs across the globe are racing to understand and mimic the spinning process to produce synthetic silk fibers for commercial, military and industrial applications. One of the main challenges to spinning artificial spider silk in the research lab involves a complete understanding of the biochemical processes that occur during extrusion of the fibers from the silk-producing glands. Here we present a method for the isolation of the seven different silk-producing glands from the cobweaving black widow spider, which includes the major and minor ampullate glands [manufactures dragline and scaffolding silk], tubuliform [synthesizes egg case silk], flagelliform [unknown function in cob-weavers], aggregate [makes glue silk], aciniform [synthesizes prey wrapping and egg case threads] and pyriform [produces attachment disc silk]. This approach is based upon anesthetizing the spider with carbon dioxide gas, subsequent separation of the cephalothorax from the abdomen, and microdissection of the abdomen to obtain the silk-producing glands. Following the separation of the different silk-producing glands, these tissues can be used to retrieve different macromolecules for distinct biochemical analyses, including quantitative real-time PCR, northern- and western blotting, mass spectrometry (MS or MS/MS) analyses to identify new silk protein sequences, search for proteins that participate in the silk assembly pathway, or use the intact tissue for cell culture or histological experiments.

  13. Lack of ubiquitin immunoreactivities at both ends of neuropil threads. Possible bidirectional growth of neuropil threads.

    PubMed Central

    Iwatsubo, T.; Hasegawa, M.; Esaki, Y.; Ihara, Y.

    1992-01-01

    Immunocytochemically, neuropil threads (curly fibers) were investigated in the Alzheimer's disease brain using a confocal laser scanning fluorescence microscope by double labeling with tau/ubiquitin antibodies. Ubiquitin immunoreactivities were found to be lacking at one or both ends in more than 40% of tau-positive threads. Immunoelectron microscopy showed that bundles of paired helical filaments, which constitute neuropil threads, were positive for ubiquitin around their midportions, but often negative at their ends. Since it is reasonable to postulate that tau deposition as paired helical filaments precedes ubiquitination, the aforementioned observation suggests that the ends of the threads are newly formed portions, and thus the threads are often growing bidirectionally in small neuronal processes. Images Figure 1 Figure 2 PMID:1310831

  14. Fatigue acceptance test limit criterion for larger diameter rolled thread fasteners

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kephart, A.R.

    1997-05-01

    This document describes a fatigue lifetime acceptance test criterion by which studs having rolled threads, larger than 1.0 inches in diameter, can be assured to meet minimum quality attributes associated with a controlled rolling process. This criterion is derived from a stress dependent, room temperature air fatigue database for test studs having a 0.625 inch diameter threads of Alloys X-750 HTH and direct aged 625. Anticipated fatigue lives of larger threads are based on thread root elastic stress concentration factors which increase with increasing thread diameters. Over the thread size range of interest, a 30% increase in notch stress ismore » equivalent to a factor of five (5X) reduction in fatigue life. The resulting diameter dependent fatigue acceptance criterion is normalized to the aerospace rolled thread acceptance standards for a 1.0 inch diameter, 0.125 inch pitch, Unified National thread with a controlled Root radius (UNR). Testing was conducted at a stress of 50% of the minimum specified material ultimate strength, 80 Ksi, and at a stress ratio (R) of 0.10. Limited test data for fastener diameters of 1.00 to 2.25 inches are compared to the acceptance criterion. Sensitivity of fatigue life of threads to test nut geometry variables was also shown to be dependent on notch stress conditions. Bearing surface concavity of the compression nuts and thread flank contact mismatch conditions can significantly affect the fastener fatigue life. Without improved controls these conditions could potentially provide misleading acceptance data. Alternate test nut geometry features are described and implemented in the rolled thread stud specification, MIL-DTL-24789(SH), to mitigate the potential effects on fatigue acceptance data.« less

  15. Porting and refurbishment of the WSS TNG control software

    NASA Astrophysics Data System (ADS)

    Caproni, Alessandro; Zacchei, Andrea; Vuerli, Claudio; Pucillo, Mauro

    2004-09-01

    The Workstation Software Sytem (WSS) is the high level control software of the Italian Galileo Galilei Telescope settled in La Palma Canary Island developed at the beginning of '90 for HP-UX workstations. WSS may be seen as a middle layer software system that manages the communications between the real time systems (VME), different workstations and high level applications providing a uniform distributed environment. The project to port the control software from the HP workstation to Linux environment started at the end of 2001. It is aimed to refurbish the control software introducing some of the new software technologies and languages, available for free in the Linux operating system. The project was realized by gradually substituting each HP workstation with a Linux PC with the goal to avoid main changes in the original software running under HP-UX. Three main phases characterized the project: creation of a simulated control room with several Linux PCs running WSS (to check all the functionality); insertion in the simulated control room of some HPs (to check the mixed environment); substitution of HP workstation in the real control room. From a software point of view, the project introduces some new technologies, like multi-threading, and the possibility to develop high level WSS applications with almost every programming language that implements the Berkley sockets. A library to develop java applications has also been created and tested.

  16. Hybrid Computational Architecture for Multi-Scale Modeling of Materials and Devices

    DTIC Science & Technology

    2016-01-03

    Equivalent: Total Number: Sub Contractors (DD882) Names of Faculty Supported Names of Under Graduate students supported Names of Personnel receiving masters...GHz, 20 cores (40 with hyper-threading ( HT )) Single node performance Node # of cores Total CPU time User CPU time System CPU time Elapsed time...INTEL20 40 (with HT ) 534.785 529.984 4.800 541.179 20 468.873 466.119 2.754 476.878 10 671.798 669.653 2.145 680.510 8 772.269 770.256 2.013

  17. Wedges for ultrasonic inspection

    DOEpatents

    Gavin, Donald A.

    1982-01-01

    An ultrasonic transducer device is provided which is used in ultrasonic inspection of the material surrounding a threaded hole and which comprises a wedge of plastic or the like including a curved threaded surface adapted to be screwed into the threaded hole and a generally planar surface on which a conventional ultrasonic transducer is mounted. The plastic wedge can be rotated within the threaded hole to inspect for flaws in the material surrounding the threaded hole.

  18. Apparatus for accurately preloading auger attachment means for frangible protective material

    NASA Technical Reports Server (NTRS)

    Wood, K. E.

    1983-01-01

    Apparatus for preloading a spring loaded threaded member is described. The apparatus is formed of three telescoping tubes. The innermost tube has means to prevent rotation of the threaded member. The middle tube is threadedly engaged with the threaded member and by axial movement applies a preload thereto. The outer tube engages a nut which may be rotated to retain the threaded member in axial position to maintain the preload.

  19. Effect of thread shape on screw stress concentration by photoelastic measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dragoni, E.

    1994-11-01

    The screw stress concentration for six nut-bolt connections embodying three different thread profiles and two nut shapes is measured photoelastically. Buttress (nearly zero flank angle), trapezoidal (15-deg flank angle), and triangular (30-deg flank angle) thread forms are examined in combination with standard and lip-type nuts. The effect of the thread profile on the screw stress concentration appears to be dependent upon the kind of nut considered. If the fastening incorporates a standard nut, the buttress thread is stronger than the triangular one, which, in turn, behaves better than the trapezoidal contour. The improvement is roughly a 20% reduction in themore » stress concentration factor from the trapezoidal to the buttress thread. In the case of lip nut, conversely, this tendency is somewhat reversed, with the trapezoidal thread performing slightly (but not decidedly) better than the other two shapes. Finally, averaged over all three thread forms, the lip nut exhibits a stress concentration factor which is about 50% lower than that of the standard nut.« less

  20. Quick connect fastener

    NASA Technical Reports Server (NTRS)

    Weddendorf, Bruce (Inventor)

    1994-01-01

    A quick connect fastener and method of use is presented wherein the quick connect fastener is suitable for replacing available bolts and screws, the quick connect fastener being capable of installation by simply pushing a threaded portion of the connector into a member receptacle hole, the inventive apparatus being comprised of an externally threaded fastener having a threaded portion slidably mounted upon a stud or bolt shaft, wherein the externally threaded fastener portion is expandable by a preloaded spring member. The fastener, upon contact with the member receptacle hole, has the capacity of presenting cylindrical threads of a reduced diameter for insertion purposes and once inserted into the receiving threads of the receptacle member hole, are expandable for engagement of the receptacle hole threads forming a quick connect of the fastener and the member to be fastened, the quick connect fastener can be further secured by rotation after insertion, even to the point of locking engagement, the quick connect fastener being disengagable only by reverse rotation of the mated thread engagement.

  1. Form and function of cnidarian spirocysts. III. Ultrastructure of the thread and the function of spirocysts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mariscal, R.N.; McLean, R.B.; Hand, C.

    1977-01-01

    Unlike most nematocysts, undischarged spirocyst threads bear hollow tubules rather than spines. The undischarged tubules are interconnected in hexagonal arrays and appear to be arranged in bundles along the length of the thread. Although the wall of the thread is folded in length and width, the tubules are not. Upon discharge and contact with sea water, the tubules solubilize and adhere to various substrates and prey. Traction between such objects and the everting thread causes the tubules to spin out into a web or meshwork of fine microfibrillae. Lack of contact of the everting thread with objects results in themore » tubules forming small droplets of partially solubilized material, some of which appear to be arranged in a helical pattern around the thread. The web or meshwork formed by the solubilized tubules in contact with various substrates probably serves to increase significantly the surface area and adhesive properties of the everted spirocyst thread.« less

  2. Model Checking Real Time Java Using Java PathFinder

    NASA Technical Reports Server (NTRS)

    Lindstrom, Gary; Mehlitz, Peter C.; Visser, Willem

    2005-01-01

    The Real Time Specification for Java (RTSJ) is an augmentation of Java for real time applications of various degrees of hardness. The central features of RTSJ are real time threads; user defined schedulers; asynchronous events, handlers, and control transfers; a priority inheritance based default scheduler; non-heap memory areas such as immortal and scoped, and non-heap real time threads whose execution is not impeded by garbage collection. The Robust Software Systems group at NASA Ames Research Center has JAVA PATHFINDER (JPF) under development, a Java model checker. JPF at its core is a state exploring JVM which can examine alternative paths in a Java program (e.g., via backtracking) by trying all nondeterministic choices, including thread scheduling order. This paper describes our implementation of an RTSJ profile (subset) in JPF, including requirements, design decisions, and current implementation status. Two examples are analyzed: jobs on a multiprogramming operating system, and a complex resource contention example involving autonomous vehicles crossing an intersection. The utility of JPF in finding logic and timing errors is illustrated, and the remaining challenges in supporting all of RTSJ are assessed.

  3. Thigmotaxis Mediates Trail Odour Disruption.

    PubMed

    Stringer, Lloyd D; Corn, Joshua E; Sik Roh, Hyun; Jiménez-Pérez, Alfredo; Manning, Lee-Anne M; Harper, Aimee R; Suckling, David M

    2017-05-10

    Disruption of foraging using oversupply of ant trail pheromones is a novel pest management application under investigation. It presents an opportunity to investigate the interaction of sensory modalities by removal of one of the modes. Superficially similar to sex pheromone-based mating disruption in moths, ant trail pheromone disruption lacks an equivalent mechanistic understanding of how the ants respond to an oversupply of their trail pheromone. Since significant compromise of one sensory modality essential for trail following (chemotaxis) has been demonstrated, we hypothesised that other sensory modalities such as thigmotaxis could act to reduce the impact on olfactory disruption of foraging behaviour. To test this, we provided a physical stimulus of thread to aid trailing by Argentine ants otherwise under disruptive pheromone concentrations. Trail following success was higher using a physical cue. While trail integrity reduced under continuous over-supply of trail pheromone delivered directly on the thread, provision of a physical cue in the form of thread slightly improved trail following and mediated trail disruption from high concentrations upwind. Our results indicate that ants are able to use physical structures to reduce but not eliminate the effects of trail pheromone disruption.

  4. Multi-Scale Approach to Understanding Source-Sink Dynamics of Amphibians

    DTIC Science & Technology

    2015-12-01

    spotted salamander, A. maculatum) at Fort Leonard Wood (FLW), Missouri. We used a multi-faceted approach in which we combined ecological , genetic...spotted salamander, A. maculatum) at Fort Leonard Wood , Missouri through a combination of intensive ecological field studies, genetic analyses, and...spatial demographic networks to identify optimal locations for wetland construction and restoration. Ecological Applications. Walls, S. C., Ball, L. C

  5. Thread bonds in molecules

    NASA Astrophysics Data System (ADS)

    Ivlev, B.

    2017-07-01

    Unusual chemical bonds are proposed. Each bond is characterized by the thread of a small radius, 10-11 cm, extended between two nuclei in a molecule. An analogue of a potential well, of the depth of MeV scale, is formed within the thread. This occurs due to the local reduction of zero point electromagnetic energy. This is similar to formation of the Casimir well. The electron-photon interaction only is not sufficient for formation of thread state. The mechanism of electron mass generation is involved in the close vicinity, 10-16 cm, of the thread. Thread bonds are stable and cannot be created or destructed in chemical or optical processes.

  6. Long-term effect of the insoluble thread-lifting technique.

    PubMed

    Fukaya, Mototsugu

    2017-01-01

    Although the thread-lifting technique for sagging faces has become more common and popular, medical literature evaluating its effects is scarce. Studies on its long-term prognosis are particularly uncommon. One hundred individuals who had previously undergone insoluble thread-lifting were retrospectively investigated. Photos in frontal and oblique views from the first and last visits were evaluated by six female individuals by guessing the patients' ages. The mean guessed age was defined as the apparent age, and the difference between the real and apparent ages was defined as the youth value. The difference between the youth values before and after the thread-lift was defined as the rejuvenation effect and analyzed in relation to the time since the operation, the number of threads used and the number of thread-lift operations performed. The rejuvenation effect decreased over the first year after the operation, but showed an increasing trend thereafter. The rejuvenation effect increased with the number of threads used and the number of thread-lift operations performed. The insoluble thread-lifting technique appears to be associated with both early and late effects. The rejuvenation effect appeared to decrease during the first year, but increased thereafter. A multicenter trial is necessary to confirm these findings.

  7. The development of data acquisition and processing application system for RF ion source

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaodan; Wang, Xiaoying; Hu, Chundong; Jiang, Caichao; Xie, Yahong; Zhao, Yuanzhe

    2017-07-01

    As the key ion source component of nuclear fusion auxiliary heating devices, the radio frequency (RF) ion source is developed and applied gradually to offer a source plasma with the advantages of ease of control and high reliability. In addition, it easily achieves long-pulse steady-state operation. During the process of the development and testing of the RF ion source, a lot of original experimental data will be generated. Therefore, it is necessary to develop a stable and reliable computer data acquisition and processing application system for realizing the functions of data acquisition, storage, access, and real-time monitoring. In this paper, the development of a data acquisition and processing application system for the RF ion source is presented. The hardware platform is based on the PXI system and the software is programmed on the LabVIEW development environment. The key technologies that are used for the implementation of this software programming mainly include the long-pulse data acquisition technology, multi-threading processing technology, transmission control communication protocol, and the Lempel-Ziv-Oberhumer data compression algorithm. Now, this design has been tested and applied on the RF ion source. The test results show that it can work reliably and steadily. With the help of this design, the stable plasma discharge data of the RF ion source are collected, stored, accessed, and monitored in real-time. It is shown that it has a very practical application significance for the RF experiments.

  8. Application Analysis and Decision with Dynamic Analysis

    DTIC Science & Technology

    2014-12-01

    pushes the application file and the JSON file containing the metadata from the database . When the 2 files are in place, the consumer thread starts...human analysts and stores it in a database . It would then use some of these data to generate a risk score for the application. However, static analysis...and store them in the primary A2D database for future analysis. 15. SUBJECT TERMS Android, dynamic analysis 16. SECURITY CLASSIFICATION OF: 17

  9. Thread Migration in the Presence of Pointers

    NASA Technical Reports Server (NTRS)

    Cronk, David; Haines, Matthew; Mehrotra, Piyush

    1996-01-01

    Dynamic migration of lightweight threads supports both data locality and load balancing. However, migrating threads that contain pointers referencing data in both the stack and heap remains an open problem. In this paper we describe a technique by which threads with pointers referencing both stack and non-shared heap data can be migrated such that the pointers remain valid after migration. As a result, threads containing pointers can now be migrated between processors in a homogeneous distributed memory environment.

  10. High precision optomechanical assembly using threads as mechanical reference

    NASA Astrophysics Data System (ADS)

    Lamontagne, Frédéric; Desnoyers, Nichola; Bergeron, Guy; Cantin, Mario

    2016-09-01

    A convenient method to assemble optomechanical components is to use threaded interface. For example, lenses are often secured inside barrels using threaded rings. In other cases, multiple optical sub-assemblies such as lens barrels can be threaded to each other. Threads have the advantage to provide a simple assembly method, to be easy to manufacture, and to offer a compact mechanical design. On the other hand, threads are not considered to provide accurate centering between parts because of the assembly clearance between the inner and outer threads. For that reason, threads are often used in conjunction with precision cylindrical surfaces to limit the radial clearance between the parts to be centered. Therefore, tight manufacturing tolerances are needed on these pilot diameters, which affect the cost of the optical assembly. This paper presents a new optomechanical approach that uses threads as mechanical reference. This innovative method relies on geometric principles to auto-center parts to each other with a very low centering error that is usually less than 5 μm. The method allows to auto-center an optical group in a main barrel, to perform an axial adjustment of an optical group inside a main barrel, and to perform stacking of multiple barrels. In conjunction with the lens auto-centering method that also used threads as a mechanical reference, this novel solution opens new possibilities to realize a variety of different high precision optomechanical assemblies at lower cost.

  11. The application analysis of the multi-angle polarization technique for ocean color remote sensing

    NASA Astrophysics Data System (ADS)

    Zhang, Yongchao; Zhu, Jun; Yin, Huan; Zhang, Keli

    2017-02-01

    The multi-angle polarization technique, which uses the intensity of polarized radiation as the observed quantity, is a new remote sensing means for earth observation. With this method, not only can the multi-angle light intensity data be provided, but also the multi-angle information of polarized radiation can be obtained. So, the technique may solve the problems, those could not be solved with the traditional remote sensing methods. Nowadays, the multi-angle polarization technique has become one of the hot topics in the field of the international quantitative research on remote sensing. In this paper, we firstly introduce the principles of the multi-angle polarization technique, then the situations of basic research and engineering applications are particularly summarized and analysed in 1) the peeled-off method of sun glitter based on polarization, 2) the ocean color remote sensing based on polarization, 3) oil spill detection using polarization technique, 4) the ocean aerosol monitoring based on polarization. Finally, based on the previous work, we briefly present the problems and prospects of the multi-angle polarization technique used in China's ocean color remote sensing.

  12. Digital fabrication of multi-material biomedical objects.

    PubMed

    Cheung, H H; Choi, S H

    2009-12-01

    This paper describes a multi-material virtual prototyping (MMVP) system for modelling and digital fabrication of discrete and functionally graded multi-material objects for biomedical applications. The MMVP system consists of a DMMVP module, an FGMVP module and a virtual reality (VR) simulation module. The DMMVP module is used to model discrete multi-material (DMM) objects, while the FGMVP module is for functionally graded multi-material (FGM) objects. The VR simulation module integrates these two modules to perform digital fabrication of multi-material objects, which can be subsequently visualized and analysed in a virtual environment to optimize MMLM processes for fabrication of product prototypes. Using the MMVP system, two biomedical objects, including a DMM human spine and an FGM intervertebral disc spacer are modelled and digitally fabricated for visualization and analysis in a VR environment. These studies show that the MMVP system is a practical tool for modelling, visualization, and subsequent fabrication of biomedical objects of discrete and functionally graded multi-materials for biomedical applications. The system may be adapted to control MMLM machines with appropriate hardware for physical fabrication of biomedical objects.

  13. Biomechanical evaluation of a novel Limb Prosthesis Osseointegrated Fixation System designed to combine the advantages of interference-fit and threaded solutions.

    PubMed

    Prochor, Piotr; Piszczatowski, Szczepan; Sajewicz, Eugeniusz

    2016-01-01

    The study was aimed at biomechanical evaluation of a novel Limb Prosthesis Osseointegrated Fixation System (LPOFS) designed to combine the advantages of interference-fit and threaded solutions. Three cases, the LPOFS (designed), the OPRA (threaded) and the ITAP (interference-fit) implants were studied. Von-Mises stresses in bone patterns and maximal values generated while axial loading on an implant placed in bone and the force reaction values in contact elements while extracting an implant were analysed. Primary and fully osteointegrated connections were considered. The results obtained for primary connection indicate more effective anchoring of the OPRA, however the LPOFS provides more appropriate stress distribution (lower stress-shielding, no overloading) in bone. In the case of fully osteointegrated connection the LPOFSs kept the most favourable stress distribution in cortical bone which is the most important long-term feature of the implant usage and bone remodelling. Moreover, in fully bound connection its anchoring elements resist extracting attempts more than the ITAP and the OPRA. The results obtained allow us to conclude that in the case of features under study the LPOFS is a more functional solution to direct skeletal attachment of limb prosthesis than the referential implants during short and long-term use.

  14. Threaded biliary inside stents are a safe and effective therapeutic option in cases of malignant hilar obstruction

    PubMed Central

    2013-01-01

    Background Although endoscopic biliary stents have been accepted as part of palliative therapy for cases of malignant hilar obstruction, the optimal endoscopic management regime remains controversial. In this study, we evaluated the safety and efficacy of placing a threaded stent above the sphincter of Oddi (threaded inside plastic stents, threaded PS) and compared the results with those of other stent types. Methods Patients with malignant hilar obstruction, including those requiring biliary drainage for stent occlusion, were selected. Patients received either one of the following endoscopic indwelling stents: threaded PS, conventional plastic stents (conventional PS), or metallic stents (MS). Duration of stent patency and the incident of complication were compared in these patients. Results Forty-two patients underwent placement of endoscopic indwelling stents (threaded PS = 12, conventional PS = 17, MS = 13). The median duration of threaded PS patency was significantly longer than that of conventional PS patency (142 vs. 32 days; P = 0.04, logrank test). The median duration of threaded PS and MS patency was not significantly different (142 vs. 150 days, P = 0.83). Stent migration did not occur in any group. Among patients who underwent threaded PS placement as a salvage therapy after MS obstruction due to tumor ingrowth, the median duration of MS patency was significantly shorter than that of threaded PS patency (123 vs. 240 days). Conclusions Threaded PS are safe and effective in cases of malignant hilar obstruction; moreover, it is a suitable therapeutic option not only for initial drainage but also for salvage therapy. PMID:23410217

  15. Parallel Implementation of 3-D Iterative Reconstruction With Intra-Thread Update for the jPET-D4

    NASA Astrophysics Data System (ADS)

    Lam, Chih Fung; Yamaya, Taiga; Obi, Takashi; Yoshida, Eiji; Inadama, Naoko; Shibuya, Kengo; Nishikido, Fumihiko; Murayama, Hideo

    2009-02-01

    One way to speed-up iterative image reconstruction is by parallel computing with a computer cluster. However, as the number of computing threads increases, parallel efficiency decreases due to network transfer delay. In this paper, we proposed a method to reduce data transfer between computing threads by introducing an intra-thread update. The update factor is collected from each slave thread and a global image is updated as usual in the first K sub-iteration. In the rest of the sub-iterations, the global image is only updated at an interval which is controlled by a parameter L. In between that interval, the intra-thread update is carried out whereby an image update is performed in each slave thread locally. We investigated combinations of K and L parameters based on parallel implementation of RAMLA for the jPET-D4 scanner. Our evaluation used four workstations with a total of 16 slave threads. Each slave thread calculated a different set of LORs which are divided according to ring difference numbers. We assessed image quality of the proposed method with a hotspot simulation phantom. The figure of merit was the full-width-half-maximum of hotspots and the background normalized standard deviation. At an optimum K and L setting, we did not find significant change in the output images. We also applied the proposed method to a Hoffman phantom experiment and found the difference due to intra-thread update was negligible. With the intra-thread update, computation time could be reduced by about 23%.

  16. Evolution of System Architectures: Where Do We Need to Fail Next?

    NASA Astrophysics Data System (ADS)

    Bermudez, Luis; Alameh, Nadine; Percivall, George

    2013-04-01

    Innovation requires testing and failing. Thomas Edison was right when he said "I have not failed. I've just found 10,000 ways that won't work". For innovation and improvement of standards to happen, service Architectures have to be tested and tested. Within the Open Geospatial Consortium (OGC), testing of service architectures has occurred for the last 15 years. This talk will present an evolution of these service architectures and a possible future path. OGC is a global forum for the collaboration of developers and users of spatial data products and services, and for the advancement and development of international standards for geospatial interoperability. The OGC Interoperability Program is a series of hands-on, fast paced, engineering initiatives to accelerate the development and acceptance of OGC standards. Each initiative is organized in threads that provide focus under a particular theme. The first testbed, OGC Web Services phase 1, completed in 2003 had four threads: Common Architecture, Web Mapping, Sensor Web and Web Imagery Enablement. The Common Architecture was a cross-thread theme, to ensure that the Web Mapping and Sensor Web experiments built on a base common architecture. The architecture was based on the three main SOA components: Broker, Requestor and Provider. It proposed a general service model defining service interactions and dependencies; categorization of service types; registries to allow discovery and access of services; data models and encodings; and common services (WMS, WFS, WCS). For the latter, there was a clear distinction on the different services: Data Services (e.g. WMS), Application services (e.g. Coordinate transformation) and server-side client applications (e.g. image exploitation). The latest testbed, OGC Web Service phase 9, completed in 2012 had 5 threads: Aviation, Cross-Community Interoperability (CCI), Security and Services Interoperability (SSI), OWS Innovations and Compliance & Interoperability Testing & Evaluation (CITE). Compared to the first testbed, OWS-9 did not have a separate common architecture thread. Instead the emphasis was on brokering information models, securing them and making data available efficiently on mobile devices. The outcome is an architecture based on usability and non-intrusiveness while leveraging mediation of information models from different communities. This talk will use lessons learned from the evolution from OGC Testbed phase 1 to phase 9 to better understand how global and complex infrastructures evolve to support many communities including the Earth System Science Community.

  17. It's a sentence, not a word: insights from a keyword analysis in cancer communication.

    PubMed

    Taylor, Kimberly; Thorne, Sally; Oliffe, John L

    2015-01-01

    Keyword analysis has been championed as a methodological option for expanding the insights that can be extracted from qualitative datasets using various properties available in qualitative software. Intrigued by the pioneering applications of Clive Seale and his colleagues in this regard, we conducted keyword analyses for word frequency and "keyness" on a qualitative database of interview transcripts from a study on cancer communication. We then subjected the results from these operations to an in-depth contextual inquiry by resituating word instances within their original speech contexts, finding that most of what had initially appeared as group variations broke down under close analysis. In this article, we illustrate the various threads of analysis, and explain how they unraveled under closer scrutiny. On the basis of this tentative exercise, we conclude that a healthy skepticism for the benefits of keyword analysis within a qualitative investigative process seems warranted. © The Author(s) 2014.

  18. 78 FR 12718 - Certain Steel Threaded Rod From the People's Republic of China: Affirmative Final Determination...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-25

    ... DEPARTMENT OF COMMERCE International Trade Administration [A-570-932] Certain Steel Threaded Rod... Preliminary Determination of the circumvention inquiry concerning the antidumping duty order on certain steel threaded rod (``steel threaded rod'') from the People's Republic of China (``PRC'').\\1\\ The period of...

  19. TomoPhantom, a software package to generate 2D-4D analytical phantoms for CT image reconstruction algorithm benchmarks

    NASA Astrophysics Data System (ADS)

    Kazantsev, Daniil; Pickalov, Valery; Nagella, Srikanth; Pasca, Edoardo; Withers, Philip J.

    2018-01-01

    In the field of computerized tomographic imaging, many novel reconstruction techniques are routinely tested using simplistic numerical phantoms, e.g. the well-known Shepp-Logan phantom. These phantoms cannot sufficiently cover the broad spectrum of applications in CT imaging where, for instance, smooth or piecewise-smooth 3D objects are common. TomoPhantom provides quick access to an external library of modular analytical 2D/3D phantoms with temporal extensions. In TomoPhantom, quite complex phantoms can be built using additive combinations of geometrical objects, such as, Gaussians, parabolas, cones, ellipses, rectangles and volumetric extensions of them. Newly designed phantoms are better suited for benchmarking and testing of different image processing techniques. Specifically, tomographic reconstruction algorithms which employ 2D and 3D scanning geometries, can be rigorously analyzed using the software. TomoPhantom also provides a capability of obtaining analytical tomographic projections which further extends the applicability of software towards more realistic, free from the "inverse crime" testing. All core modules of the package are written in the C-OpenMP language and wrappers for Python and MATLAB are provided to enable easy access. Due to C-based multi-threaded implementation, volumetric phantoms of high spatial resolution can be obtained with computational efficiency.

  20. III/V nano ridge structures for optical applications on patterned 300 mm silicon substrate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kunert, B.; Guo, W.; Mols, Y.

    We report on an integration approach of III/V nano ridges on patterned silicon (Si) wafers by metal organic vapor phase epitaxy (MOVPE). Trenches of different widths (≤500 nm) were processed in a silicon oxide (SiO{sub 2}) layer on top of a 300 mm (001) Si substrate. The MOVPE growth conditions were chosen in a way to guarantee an efficient defect trapping within narrow trenches and to form a box shaped ridge with increased III/V volume when growing out of the trench. Compressively strained InGaAs/GaAs multi-quantum wells with 19% indium were deposited on top of the fully relaxed GaAs ridges as an activemore » material for optical applications. Transmission electron microcopy investigation shows that very flat quantum well (QW) interfaces were realized. A clear defect trapping inside the trenches is observed whereas the ridge material is free of threading dislocations with only a very low density of planar defects. Pronounced QW photoluminescence (PL) is detected from different ridge sizes at room temperature. The potential of these III/V nano ridges for laser integration on Si substrates is emphasized by the achieved ridge volume which could enable wave guidance and by the high crystal quality in line with the distinct PL.« less

  1. Three-dimensional imaging of threading dislocations in GaN crystals using two-photon excitation photoluminescence

    NASA Astrophysics Data System (ADS)

    Tanikawa, Tomoyuki; Ohnishi, Kazuki; Kanoh, Masaya; Mukai, Takashi; Matsuoka, Takashi

    2018-03-01

    The three-dimensional imaging of threading dislocations in GaN films was demonstrated using two-photon excitation photoluminescence. The threading dislocations were shown as dark lines. The spatial resolutions near the surface were about 0.32 and 3.2 µm for the in-plane and depth directions, respectively. The threading dislocations with a density less than 108 cm-2 were resolved, although the aberration induced by the refractive index mismatch was observed. The decrease in threading dislocation density was clearly observed by increasing the GaN film thickness. This can be considered a novel method for characterizing threading dislocations in GaN films without any destructive preparations.

  2. A software bus for thread objects

    NASA Technical Reports Server (NTRS)

    Callahan, John R.; Li, Dehuai

    1995-01-01

    The authors have implemented a software bus for lightweight threads in an object-oriented programming environment that allows for rapid reconfiguration and reuse of thread objects in discrete-event simulation experiments. While previous research in object-oriented, parallel programming environments has focused on direct communication between threads, our lightweight software bus, called the MiniBus, provides a means to isolate threads from their contexts of execution by restricting communications between threads to message-passing via their local ports only. The software bus maintains a topology of connections between these ports. It routes, queues, and delivers messages according to this topology. This approach allows for rapid reconfiguration and reuse of thread objects in other systems without making changes to the specifications or source code. A layered approach that provides the needed transparency to developers is presented. Examples of using the MiniBus are given, and the value of bus architectures in building and conducting simulations of discrete-event systems is discussed.

  3. 49 CFR 178.42 - Specification 3E seamless steel cylinders.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... (valves, fuse plugs, etc.) for those openings. Threads conforming to the following are required on openings. (1) Threads must be clean cut, even, without checks, and to gauge. (2) Taper threads, when used, must be of length not less than as specified for American Standard taper pipe threads. (3) Straight...

  4. 49 CFR 178.42 - Specification 3E seamless steel cylinders.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... (valves, fuse plugs, etc.) for those openings. Threads conforming to the following are required on openings. (1) Threads must be clean cut, even, without checks, and to gauge. (2) Taper threads, when used, must be of length not less than as specified for American Standard taper pipe threads. (3) Straight...

  5. Neuropil threads occur in dendrites of tangle-bearing nerve cells.

    PubMed

    Braak, H; Braak, E

    1988-01-01

    Transparent Golgi preparations counterstained for Alzheimer's neurofibrillary changes rendered possible the demonstration of neuropil threads in defined cellular processes. Only dendrites of tangle-bearing cortical nerve cells were found to contain neuropil threads. Processes of glial cells as well as axons present in the material were devoid of neuropil threads.

  6. 49 CFR 178.42 - Specification 3E seamless steel cylinders.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... (valves, fuse plugs, etc.) for those openings. Threads conforming to the following are required on openings. (1) Threads must be clean cut, even, without checks, and to gauge. (2) Taper threads, when used, must be of length not less than as specified for American Standard taper pipe threads. (3) Straight...

  7. Threaded Cognition: An Integrated Theory of Concurrent Multitasking

    ERIC Educational Resources Information Center

    Salvucci, Dario D.; Taatgen, Niels A.

    2008-01-01

    The authors propose the idea of threaded cognition, an integrated theory of concurrent multitasking--that is, performing 2 or more tasks at once. Threaded cognition posits that streams of thought can be represented as threads of processing coordinated by a serial procedural resource and executed across other available resources (e.g., perceptual…

  8. 49 CFR 178.42 - Specification 3E seamless steel cylinders.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... (valves, fuse plugs, etc.) for those openings. Threads conforming to the following are required on openings. (1) Threads must be clean cut, even, without checks, and to gauge. (2) Taper threads, when used, must be of length not less than as specified for American Standard taper pipe threads. (3) Straight...

  9. 49 CFR 178.42 - Specification 3E seamless steel cylinders.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... (valves, fuse plugs, etc.) for those openings. Threads conforming to the following are required on openings. (1) Threads must be clean cut, even, without checks, and to gauge. (2) Taper threads, when used, must be of length not less than as specified for American Standard taper pipe threads. (3) Straight...

  10. A Primer on the Effective Use of Threaded Discussion Forums.

    ERIC Educational Resources Information Center

    Kirk, James J.; Orr, Robert L.

    Threaded discussion forums are asynchronous, World Wide Web-based discussions occurring under a number of different topics called threads. By allowing students to post, read, and respond to messages independently of time or place, threaded discussion forums give students an opportunity for deeper reflection and more thoughtful replies than chat…

  11. 46 CFR 164.023-7 - Performance; non-standard thread.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 46 Shipping 6 2010-10-01 2010-10-01 false Performance; non-standard thread. 164.023-7 Section 164... Performance; non-standard thread. (a) Use Codes 1, 2, 3, 4BC, 4RB, 5 (any). Each non-standard thread which...) testing machine. (2) Single strand breaking strength (after weathering). After exposure in a sunshine...

  12. 46 CFR 164.023-7 - Performance; non-standard thread.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 46 Shipping 6 2011-10-01 2011-10-01 false Performance; non-standard thread. 164.023-7 Section 164... Performance; non-standard thread. (a) Use Codes 1, 2, 3, 4BC, 4RB, 5 (any). Each non-standard thread which...) testing machine. (2) Single strand breaking strength (after weathering). After exposure in a sunshine...

  13. Threaded average temperature thermocouple

    NASA Technical Reports Server (NTRS)

    Ward, Stanley W. (Inventor)

    1990-01-01

    A threaded average temperature thermocouple 11 is provided to measure the average temperature of a test situs of a test material 30. A ceramic insulator rod 15 with two parallel holes 17 and 18 through the length thereof is securely fitted in a cylinder 16, which is bored along the longitudinal axis of symmetry of threaded bolt 12. Threaded bolt 12 is composed of material having thermal properties similar to those of test material 30. Leads of a thermocouple wire 20 leading from a remotely situated temperature sensing device 35 are each fed through one of the holes 17 or 18, secured at head end 13 of ceramic insulator rod 15, and exit at tip end 14. Each lead of thermocouple wire 20 is bent into and secured in an opposite radial groove 25 in tip end 14 of threaded bolt 12. Resulting threaded average temperature thermocouple 11 is ready to be inserted into cylindrical receptacle 32. The tip end 14 of the threaded average temperature thermocouple 11 is in intimate contact with receptacle 32. A jam nut 36 secures the threaded average temperature thermocouple 11 to test material 30.

  14. A Moiré Pattern-Based Thread Counter

    NASA Astrophysics Data System (ADS)

    Reich, Gary

    2017-10-01

    Thread count is a term used in the textile industry as a measure of how closely woven a fabric is. It is usually defined as the sum of the number of warp threads per inch (or cm) and the number of weft threads per inch. (It is sometimes confusingly described as the number of threads per square inch.) In recent years it has also become a subject of considerable interest and some controversy among consumers. Many consumers consider thread count to be a key measure of the quality or fineness of a fabric, especially bed sheets, and they seek out fabrics that advertise high counts. Manufacturers in turn have responded to this interest by offering fabrics with ever higher claimed thread counts (sold at ever higher prices), sometime achieving the higher counts by distorting the definition of the term with some "creative math." In 2005 the Federal Trade Commission noted the growing use of thread count in advertising at the retail level and warned of the potential for consumers to be misled by distortions of the definition.

  15. Hyperunstable matrix proteins in the byssus of Mytilus galloprovincialis.

    PubMed

    Sagert, Jason; Waite, J Herbert

    2009-07-01

    The marine mussel Mytilus galloprovincialis is tethered to rocks in the intertidal zone by a holdfast known as the byssus. Functioning as a shock absorber, the byssus is composed of threads, the primary molecular components of which are collagen-containing proteins (preCOLs) that largely dictate the higher order self-assembly and mechanical properties of byssal threads. The threads contain additional matrix components that separate and perhaps lubricate the collagenous microfibrils during deformation in tension. In this study, the thread matrix proteins (TMPs), a glycine-, tyrosine- and asparagine-rich protein family, were shown to possess unique repeated sequence motifs, significant transcriptional heterogeneity and were distributed throughout the byssal thread. Deamidation was shown to occur at a significant rate in a recombinant TMP and in the byssal thread as a function of time. Furthermore, charge heterogeneity presumably due to deamidation was observed in TMPs extracted from threads. The TMPs were localized to the preCOL-containing secretory granules in the collagen gland of the foot and are assumed to provide a viscoelastic matrix around the collagenous fibers in byssal threads.

  16. A multi-criteria evaluation system for marine litter pollution based on statistical analyses of OSPAR beach litter monitoring time series.

    PubMed

    Schulz, Marcus; Neumann, Daniel; Fleet, David M; Matthies, Michael

    2013-12-01

    During the last decades, marine pollution with anthropogenic litter has become a worldwide major environmental concern. Standardized monitoring of litter since 2001 on 78 beaches selected within the framework of the Convention for the Protection of the Marine Environment of the North-East Atlantic (OSPAR) has been used to identify temporal trends of marine litter. Based on statistical analyses of this dataset a two-part multi-criteria evaluation system for beach litter pollution of the North-East Atlantic and the North Sea is proposed. Canonical correlation analyses, linear regression analyses, and non-parametric analyses of variance were used to identify different temporal trends. A classification of beaches was derived from cluster analyses and served to define different states of beach quality according to abundances of 17 input variables. The evaluation system is easily applicable and relies on the above-mentioned classification and on significant temporal trends implied by significant rank correlations. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Fast Automatic Segmentation of White Matter Streamlines Based on a Multi-Subject Bundle Atlas.

    PubMed

    Labra, Nicole; Guevara, Pamela; Duclap, Delphine; Houenou, Josselin; Poupon, Cyril; Mangin, Jean-François; Figueroa, Miguel

    2017-01-01

    This paper presents an algorithm for fast segmentation of white matter bundles from massive dMRI tractography datasets using a multisubject atlas. We use a distance metric to compare streamlines in a subject dataset to labeled centroids in the atlas, and label them using a per-bundle configurable threshold. In order to reduce segmentation time, the algorithm first preprocesses the data using a simplified distance metric to rapidly discard candidate streamlines in multiple stages, while guaranteeing that no false negatives are produced. The smaller set of remaining streamlines is then segmented using the original metric, thus eliminating any false positives from the preprocessing stage. As a result, a single-thread implementation of the algorithm can segment a dataset of almost 9 million streamlines in less than 6 minutes. Moreover, parallel versions of our algorithm for multicore processors and graphics processing units further reduce the segmentation time to less than 22 seconds and to 5 seconds, respectively. This performance enables the use of the algorithm in truly interactive applications for visualization, analysis, and segmentation of large white matter tractography datasets.

  18. Livermore Compiler Analysis Loop Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, R. D.

    2013-03-01

    LCALS is designed to evaluate compiler optimizations and performance of a variety of loop kernels and loop traversal software constructs. Some of the loop kernels are pulled directly from "Livermore Loops Coded in C", developed at LLNL (see item 11 below for details of earlier code versions). The older suites were used to evaluate floating-point performances of hardware platforms prior to porting larger application codes. The LCALS suite is geared toward assissing C++ compiler optimizations and platform performance related to SIMD vectorization, OpenMP threading, and advanced C++ language features. LCALS contains 20 of 24 loop kernels from the older Livermoremore » Loop suites, plus various others representative of loops found in current production appkication codes at LLNL. The latter loops emphasize more diverse loop constructs and data access patterns than the others, such as multi-dimensional difference stencils. The loops are included in a configurable framework, which allows control of compilation, loop sampling for execution timing, which loops are run and their lengths. It generates timing statistics for analysis and comparing variants of individual loops. Also, it is easy to add loops to the suite as desired.« less

  19. Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

    PubMed

    Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

    2015-01-01

    Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.

  20. Integrated optical 3D digital imaging based on DSP scheme

    NASA Astrophysics Data System (ADS)

    Wang, Xiaodong; Peng, Xiang; Gao, Bruce Z.

    2008-03-01

    We present a scheme of integrated optical 3-D digital imaging (IO3DI) based on digital signal processor (DSP), which can acquire range images independently without PC support. This scheme is based on a parallel hardware structure with aid of DSP and field programmable gate array (FPGA) to realize 3-D imaging. In this integrated scheme of 3-D imaging, the phase measurement profilometry is adopted. To realize the pipeline processing of the fringe projection, image acquisition and fringe pattern analysis, we present a multi-threads application program that is developed under the environment of DSP/BIOS RTOS (real-time operating system). Since RTOS provides a preemptive kernel and powerful configuration tool, with which we are able to achieve a real-time scheduling and synchronization. To accelerate automatic fringe analysis and phase unwrapping, we make use of the technique of software optimization. The proposed scheme can reach a performance of 39.5 f/s (frames per second), so it may well fit into real-time fringe-pattern analysis and can implement fast 3-D imaging. Experiment results are also presented to show the validity of proposed scheme.

Top