DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.
Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for themore » context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.« less
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-10-22
Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Watkins, C Edward
2012-09-01
In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Advanced miniature processing handware for ATR applications
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin (Inventor); Daud, Taher (Inventor); Thakoor, Anikumar (Inventor)
2003-01-01
A Hybrid Optoelectronic Neural Object Recognition System (HONORS), is disclosed, comprising two major building blocks: (1) an advanced grayscale optical correlator (OC) and (2) a massively parallel three-dimensional neural-processor. The optical correlator, with its inherent advantages in parallel processing and shift invariance, is used for target of interest (TOI) detection and segmentation. The three-dimensional neural-processor, with its robust neural learning capability, is used for target classification and identification. The hybrid optoelectronic neural object recognition system, with its powerful combination of optical processing and neural networks, enables real-time, large frame, automatic target recognition (ATR).
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2004-12-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2005-01-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
ERIC Educational Resources Information Center
Laszlo, Sarah; Plaut, David C.
2012-01-01
The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between…
Using the Extended Parallel Process Model to Examine Teachers' Likelihood of Intervening in Bullying
ERIC Educational Resources Information Center
Duong, Jeffrey; Bradshaw, Catherine P.
2013-01-01
Background: Teachers play a critical role in protecting students from harm in schools, but little is known about their attitudes toward addressing problems like bullying. Previous studies have rarely used theoretical frameworks, making it difficult to advance this area of research. Using the Extended Parallel Process Model (EPPM), we examined the…
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Gooding, Owen W
2004-06-01
The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.
Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics.
Sadler, Brian M; Hoyos, Sebastian
2014-01-01
The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control.
Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics
Sadler, Brian M; Hoyos, Sebastian
2014-01-01
The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control. PMID:26601042
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Parallel computing in genomic research: advances and applications
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801
Parallel computing in genomic research: advances and applications.
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Use of parallel computing in mass processing of laser data
NASA Astrophysics Data System (ADS)
Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.
2015-12-01
The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Generating unstructured nuclear reactor core meshes in parallel
Jain, Rajeev; Tautges, Timothy J.
2014-10-24
Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Domain Decomposition By the Advancing-Partition Method for Parallel Unstructured Grid Generation
NASA Technical Reports Server (NTRS)
Pirzadeh, Shahyar Z.; Zagaris, George
2009-01-01
A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Society of the plastic industry process emission initiatives
NASA Technical Reports Server (NTRS)
Mcdermott, Joseph
1994-01-01
At first view, plastics process emissions research may not seem to have much bearing on outgassing considerations relative to advanced composite materials; however, several parallel issues and cross currents are of mutual interest. The following topics are discussed: relevance of plastics industry research to aerospace composites; impact of clean air act amendment requirements; scope of the Society of the Plastics Industry, Inc. activities in thermoplastic process emissions and reinforced plastics/composites process emissions; and utility of SPI research for advanced polymer composites audiences.
Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications
NASA Technical Reports Server (NTRS)
Edwards, David E.; Haimes, Robert
1999-01-01
An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.
Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele
2015-01-01
Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)
2000-01-01
HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
The science of computing - Parallel computation
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
NexGen PVAs: Incorporating Eco-Evolutionary Processes into Population Viability Models
We examine how the integration of evolutionary and ecological processes in population dynamics – an emerging framework in ecology – could be incorporated into population viability analysis (PVA). Driven by parallel, complementary advances in population genomics and computational ...
National Combustion Code: Parallel Performance
NASA Technical Reports Server (NTRS)
Babrauckas, Theresa
2001-01-01
This report discusses the National Combustion Code (NCC). The NCC is an integrated system of codes for the design and analysis of combustion systems. The advanced features of the NCC meet designers' requirements for model accuracy and turn-around time. The fundamental features at the inception of the NCC were parallel processing and unstructured mesh. The design and performance of the NCC are discussed.
Computer Sciences and Data Systems, volume 2
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J
2014-12-23
Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A
2012-01-01
At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Computational methods and software systems for dynamics and control of large space structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Felippa, C. A.; Farhat, C.; Pramono, E.
1990-01-01
Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers.
Current status and future prospects for enabling chemistry technology in the drug discovery process.
Djuric, Stevan W; Hutchins, Charles W; Talaty, Nari N
2016-01-01
This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of "dangerous" reagents. Also featured are advances in the "computer-assisted drug design" area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Comparing an FPGA to a Cell for an Image Processing Application
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.
2010-12-01
Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing
NASA Astrophysics Data System (ADS)
Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.
2018-03-01
The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
The Portals 4.0 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2012-11-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Graphics processing unit based computation for NDE applications
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.
2012-05-01
Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J
2004-09-01
We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Current status and future prospects for enabling chemistry technology in the drug discovery process
Djuric, Stevan W.; Hutchins, Charles W.; Talaty, Nari N.
2016-01-01
This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of “dangerous” reagents. Also featured are advances in the “computer-assisted drug design” area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities. PMID:27781094
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Evaluation of the Intel iWarp parallel processor for space flight applications
NASA Technical Reports Server (NTRS)
Hine, Butler P., III; Fong, Terrence W.
1993-01-01
The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
NASA Astrophysics Data System (ADS)
Voter, Arthur
Many important materials processes take place on time scales that far exceed the roughly one microsecond accessible to molecular dynamics simulation. Typically, this long-time evolution is characterized by a succession of thermally activated infrequent events involving defects in the material. In the accelerated molecular dynamics (AMD) methodology, known characteristics of infrequent-event systems are exploited to make reactive events take place more frequently, in a dynamically correct way. For certain processes, this approach has been remarkably successful, offering a view of complex dynamical evolution on time scales of microseconds, milliseconds, and sometimes beyond. We have recently made advances in all three of the basic AMD methods (hyperdynamics, parallel replica dynamics, and temperature accelerated dynamics (TAD)), exploiting both algorithmic advances and novel parallelization approaches. I will describe these advances, present some examples of our latest results, and discuss what should be possible when exascale computing arrives in roughly five years. Funded by the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, and by the Los Alamos Laboratory Directed Research and Development program.
ERIC Educational Resources Information Center
Liou, Hsien-Chin; Chang, Jason S; Chen, Hao-Jan; Lin, Chih-Cheng; Liaw, Meei-Ling; Gao, Zhao-Ming; Jang, Jyh-Shing Roger; Yeh, Yuli; Chuang, Thomas C.; You, Geeng-Neng
2006-01-01
This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus ("Sinorama") and various natural language processing (NLP) tools to construct effective English…
NASA Technical Reports Server (NTRS)
Consoli, Robert David; Sobieszczanski-Sobieski, Jaroslaw
1990-01-01
Advanced multidisciplinary analysis and optimization methods, namely system sensitivity analysis and non-hierarchical system decomposition, are applied to reduce the cost and improve the visibility of an automated vehicle design synthesis process. This process is inherently complex due to the large number of functional disciplines and associated interdisciplinary couplings. Recent developments in system sensitivity analysis as applied to complex non-hierarchic multidisciplinary design optimization problems enable the decomposition of these complex interactions into sub-processes that can be evaluated in parallel. The application of these techniques results in significant cost, accuracy, and visibility benefits for the entire design synthesis process.
The portals 4.0.1 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2013-04-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research
2016-11-13
these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
Domain Decomposition By the Advancing-Partition Method
NASA Technical Reports Server (NTRS)
Pirzadeh, Shahyar Z.
2008-01-01
A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
2012-09-30
recognition. Algorithm design and statistical analysis and feature analysis. Post -Doctoral Associate, Cornell University, Bioacoustics Research...short. The HPC-ADA was designed based on fielded systems [1-4, 6] that offer a variety of desirable attributes, specifically dynamic resource...The software package was designed to utilize parallel and distributed processing for running recognition and other advanced algorithms. DeLMA
Cloud parallel processing of tandem mass spectrometry based proteomics data.
Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus
2012-10-05
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
NASA Astrophysics Data System (ADS)
Hartmann, Alfred; Redfield, Steve
1989-04-01
This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
ARPA surveillance technology for detection of targets hidden in foliage
NASA Astrophysics Data System (ADS)
Hoff, Lawrence E.; Stotts, Larry B.
1994-02-01
The processing of large quantities of synthetic aperture radar data in real time is a complex problem. Even the image formation process taxes today's most advanced computers. The use of complex algorithms with multiple channels adds another dimension to the computational problem. Advanced Research Projects Agency (ARPA) is currently planning on using the Paragon parallel processor for this task. The Paragon is small enough to allow its use in a sensor aircraft. Candidate algorithms will be implemented on the Paragon for evaluation for real time processing. In this paper ARPA technology developments for detecting targets hidden in foliage are reviewed and examples of signal processing techniques on field collected data are presented.
Design of neurophysiologically motivated structures of time-pulse coded neurons
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Nikolsky, Alexander I.; Lazarev, Alexander A.; Lobodzinska, Raisa F.
2009-04-01
The common methodology of biologically motivated concept of building of processing sensors systems with parallel input and picture operands processing and time-pulse coding are described in paper. Advantages of such coding for creation of parallel programmed 2D-array structures for the next generation digital computers which require untraditional numerical systems for processing of analog, digital, hybrid and neuro-fuzzy operands are shown. The optoelectronic time-pulse coded intelligent neural elements (OETPCINE) simulation results and implementation results of a wide set of neuro-fuzzy logic operations are considered. The simulation results confirm engineering advantages, intellectuality, circuit flexibility of OETPCINE for creation of advanced 2D-structures. The developed equivalentor-nonequivalentor neural element has power consumption of 10mW and processing time about 10...100us.
A new strategy for efficient solar energy conversion: Parallel-processing with surface plasmons
NASA Technical Reports Server (NTRS)
Anderson, L. M.
1982-01-01
This paper introduces an advanced concept for direct conversion of sunlight to electricity, which aims at high efficiency by tailoring the conversion process to separate energy bands within the broad solar spectrum. The objective is to obtain a high level of spectrum-splitting without sequential losses or unique materials for each frequency band. In this concept, sunlight excites a spectrum of surface plasma waves which are processed in parallel on the same metal film. The surface plasmons transport energy to an array of metal-barrier-semiconductor diodes, where energy is extracted by inelastic tunneling. Diodes are tuned to different frequency bands by selecting the operating voltage and geometry, but all diodes share the same materials.
Li, Kangkang; Yu, Hai; Feron, Paul; Tade, Moses; Wardhaugh, Leigh
2015-08-18
Using a rate-based model, we assessed the technical feasibility and energy performance of an advanced aqueous-ammonia-based postcombustion capture process integrated with a coal-fired power station. The capture process consists of three identical process trains in parallel, each containing a CO2 capture unit, an NH3 recycling unit, a water separation unit, and a CO2 compressor. A sensitivity study of important parameters, such as NH3 concentration, lean CO2 loading, and stripper pressure, was performed to minimize the energy consumption involved in the CO2 capture process. Process modifications of the rich-split process and the interheating process were investigated to further reduce the solvent regeneration energy. The integrated capture system was then evaluated in terms of the mass balance and the energy consumption of each unit. The results show that our advanced ammonia process is technically feasible and energy-competitive, with a low net power-plant efficiency penalty of 7.7%.
Highly Non-Linear Optical (NLO) organic crystals and films. Electrooptical organic materials
NASA Technical Reports Server (NTRS)
Mcmanus, Samuel P.; Rosenberger, Franz; Matthews, John
1987-01-01
Devices employing nonlinear optics (NLO) hold great promise for important applications in integrated optics, optical information processing and telecommunications. Properly designed organics possess outstanding optical and electrooptical properties which will substantially advance many technologies including electrooptical switching, optical amplification for communications, and parallel processing for hybrid optical computers. A brief comparison of organic and inorganic materials is given.
NASA Astrophysics Data System (ADS)
Pourteau, Marie-Line; Servin, Isabelle; Lepinay, Kévin; Essomba, Cyrille; Dal'Zotto, Bernard; Pradelles, Jonathan; Lattard, Ludovic; Brandt, Pieter; Wieland, Marco
2016-03-01
The emerging Massively Parallel-Electron Beam Direct Write (MP-EBDW) is an attractive high resolution high throughput lithography technology. As previously shown, Chemically Amplified Resists (CARs) meet process/integration specifications in terms of dose-to-size, resolution, contrast, and energy latitude. However, they are still limited by their line width roughness. To overcome this issue, we tested an alternative advanced non-CAR and showed it brings a substantial gain in sensitivity compared to CAR. We also implemented and assessed in-line post-lithographic treatments for roughness mitigation. For outgassing-reduction purpose, a top-coat layer is added to the total process stack. A new generation top-coat was tested and showed improved printing performances compared to the previous product, especially avoiding dark erosion: SEM cross-section showed a straight pattern profile. A spin-coatable charge dissipation layer based on conductive polyaniline has also been tested for conductivity and lithographic performances, and compatibility experiments revealed that the underlying resist type has to be carefully chosen when using this product. Finally, the Process Of Reference (POR) trilayer stack defined for 5 kV multi-e-beam lithography was successfully etched with well opened and straight patterns, and no lithography-etch bias.
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
Advanced Material Strategies for Next-Generation Additive Manufacturing
Chang, Jinke; He, Jiankang; Zhou, Wenxing; Lei, Qi; Li, Xiao; Li, Dichen
2018-01-01
Additive manufacturing (AM) has drawn tremendous attention in various fields. In recent years, great efforts have been made to develop novel additive manufacturing processes such as micro-/nano-scale 3D printing, bioprinting, and 4D printing for the fabrication of complex 3D structures with high resolution, living components, and multimaterials. The development of advanced functional materials is important for the implementation of these novel additive manufacturing processes. Here, a state-of-the-art review on advanced material strategies for novel additive manufacturing processes is provided, mainly including conductive materials, biomaterials, and smart materials. The advantages, limitations, and future perspectives of these materials for additive manufacturing are discussed. It is believed that the innovations of material strategies in parallel with the evolution of additive manufacturing processes will provide numerous possibilities for the fabrication of complex smart constructs with multiple functions, which will significantly widen the application fields of next-generation additive manufacturing. PMID:29361754
Advanced Material Strategies for Next-Generation Additive Manufacturing.
Chang, Jinke; He, Jiankang; Mao, Mao; Zhou, Wenxing; Lei, Qi; Li, Xiao; Li, Dichen; Chua, Chee-Kai; Zhao, Xin
2018-01-22
Additive manufacturing (AM) has drawn tremendous attention in various fields. In recent years, great efforts have been made to develop novel additive manufacturing processes such as micro-/nano-scale 3D printing, bioprinting, and 4D printing for the fabrication of complex 3D structures with high resolution, living components, and multimaterials. The development of advanced functional materials is important for the implementation of these novel additive manufacturing processes. Here, a state-of-the-art review on advanced material strategies for novel additive manufacturing processes is provided, mainly including conductive materials, biomaterials, and smart materials. The advantages, limitations, and future perspectives of these materials for additive manufacturing are discussed. It is believed that the innovations of material strategies in parallel with the evolution of additive manufacturing processes will provide numerous possibilities for the fabrication of complex smart constructs with multiple functions, which will significantly widen the application fields of next-generation additive manufacturing.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
NASA Astrophysics Data System (ADS)
Grzeszczuk, A.; Kowalski, S.
2015-04-01
Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
An efficient route to bispecific antibody production using single-reactor mammalian co-culture
Shatz, Whitney; Ng, Domingos; Dutina, George; Wong, Athena W.; Sonoda, Junichiro; Scheer, Justin M.
2016-01-01
ABSTRACT Bispecific antibodies have shown promise in the clinic as medicines with novel mechanisms of action. Lack of efficient production of bispecific IgGs, however, has limited their rapid advancement. Here, we describe a single-reactor process using mammalian cell co-culture production to efficiently produce a bispecific IgG with 4 distinct polypeptide chains without the need for parallel processing of each half-antibody or additional framework mutations. This method resembles a conventional process, and the quality and yield of the monoclonal antibodies are equal to those produced using parallel processing methods. We demonstrate the application of the approach to diverse bispecific antibodies, and its suitability for production of a tissue specific molecule targeting fibroblast growth factor receptor 1 and klotho β that is being developed for type 2 diabetes and other obesity-linked disorders. PMID:27680183
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prudencio, E.; Candel, A.; Ge, L.
2008-02-04
In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation timemore » from days to some hours.« less
Annual Research Review: What is Resilience within the Social Ecology of Human Development?
ERIC Educational Resources Information Center
Ungar, Michael; Ghazinour, Mehdi; Richter, Jorg
2013-01-01
Background: The development of Bronfenbrenner's bio-social-ecological systems model of human development parallels advances made to the theory of resilience that progressively moved from a more individual (micro) focus on traits to a multisystemic understanding of person-environment reciprocal processes. Methods: This review uses…
An Advanced Simulation Framework for Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Li, P. P.; Tyrrell, R. Yeung D.; Adhami, N.; Li, T.; Henry, H.
1994-01-01
Discrete-event simulation (DEVS) users have long been faced with a three-way trade-off of balancing execution time, model fidelity, and number of objects simulated. Because of the limits of computer processing power the analyst is often forced to settle for less than desired performances in one or more of these areas.
Recent advances and future prospects for Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Forrest B
2010-01-01
The history of Monte Carlo methods is closely linked to that of computers: The first known Monte Carlo program was written in 1947 for the ENIAC; a pre-release of the first Fortran compiler was used for Monte Carlo In 1957; Monte Carlo codes were adapted to vector computers in the 1980s, clusters and parallel computers in the 1990s, and teraflop systems in the 2000s. Recent advances include hierarchical parallelism, combining threaded calculations on multicore processors with message-passing among different nodes. With the advances In computmg, Monte Carlo codes have evolved with new capabilities and new ways of use. Production codesmore » such as MCNP, MVP, MONK, TRIPOLI and SCALE are now 20-30 years old (or more) and are very rich in advanced featUres. The former 'method of last resort' has now become the first choice for many applications. Calculations are now routinely performed on office computers, not just on supercomputers. Current research and development efforts are investigating the use of Monte Carlo methods on FPGAs. GPUs, and many-core processors. Other far-reaching research is exploring ways to adapt Monte Carlo methods to future exaflop systems that may have 1M or more concurrent computational processes.« less
Parallel processing of embossing dies with ultrafast lasers
NASA Astrophysics Data System (ADS)
Jarczynski, Manfred; Mitra, Thomas; Brüning, Stephan; Du, Keming; Jenke, Gerald
2018-02-01
Functionalization of surfaces equips products and components with new features like hydrophilic behavior, adjustable gloss level, light management properties, etc. Small feature sizes demand diffraction-limited spots and adapted fluence for different materials. Through the availability of high power fast repeating ultrashort pulsed lasers and efficient optical processing heads delivering diffraction-limited small spot size of around 10μm it is feasible to achieve fluences higher than an adequate patterning requires. Hence, parallel processing is becoming of interest to increase the throughput and allow mass production of micro machined surfaces. The first step on the roadmap of parallel processing for cylinder embossing dies was realized with an eight- spot processing head based on ns-fiber laser with passive optical beam splitting, individual spot switching by acousto optical modulation and an advanced imaging. Patterning of cylindrical embossing dies shows a high efficiency of nearby 80%, diffraction-limited and equally spaced spots with pitches down to 25μm achieved by a compression using cascaded prism arrays. Due to the nanoseconds laser pulses the ablation shows the typical surrounding material deposition of a hot process. In the next step the processing head was adapted to a picosecond-laser source and the 500W fiber laser was replaced by an ultrashort pulsed laser with 300W, 12ps and a repetition frequency of up to 6MHz. This paper presents details about the processing head design and the analysis of ablation rates and patterns on steel, copper and brass dies. Furthermore, it gives an outlook on scaling the parallel processing head from eight to 16 individually switched beamlets to increase processing throughput and optimized utilization of the available ultrashort pulsed laser energy.
Multiscale Simulations of Magnetic Island Coalescence
NASA Technical Reports Server (NTRS)
Dorelli, John C.
2010-01-01
We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.
Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing
NASA Astrophysics Data System (ADS)
Meng, Xiang
The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In particular, parallel computing are forms of computation operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. In this dissertation, we report a series of new nanophotonic developments using the advanced parallel computing techniques. The applications include the structure optimizations at the nanoscale to control both the electromagnetic response of materials, and to manipulate nanoscale structures for enhanced field concentration, which enable breakthroughs in imaging, sensing systems (chapter 3 and 4) and improve the spatial-temporal resolutions of spectroscopies (chapter 5). We also report the investigations on the confinement study of optical-matter interactions at the quantum mechanical regime, where the size-dependent novel properties enhanced a wide range of technologies from the tunable and efficient light sources, detectors, to other nanophotonic elements with enhanced functionality (chapter 6 and 7).
Every factor helps: Rapid Ptychographic Reconstruction
NASA Astrophysics Data System (ADS)
Nashed, Youssef
2015-03-01
Recent advances in microscopy, specifically higher spatial resolution and data acquisition rates, require faster and more robust phase retrieval reconstruction methods. Ptychography is a phase retrieval technique for reconstructing the complex transmission function of a specimen from a sequence of diffraction patterns in visible light, X-ray, and electron microscopes. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes. Waiting to postprocess datasets offline results in missed opportunities. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs). A final specimen reconstruction is then achieved by different techniques to merge sub-dataset results into a single complex phase and amplitude image. Results are shown on a simulated specimen and real datasets from X-ray experiments conducted at a synchrotron light source.
Epigenetics and RNA Processing: Connections to Drought, Salt, and ABA?
Wong, Min May; Chong, Geeng Loo; Verslues, Paul E
2017-01-01
There have been great research advances in epigenetics, RNA splicing, and mRNA processing over recent years. In parallel, there have been many advances in abiotic stress and Abscisic Acid (ABA) signaling. Here we overview studies that have examined stress-induced changes in the epigenome and RNA processing as well as cases where disrupting these processes changes the plant response to abiotic stress. We also highlight some examples where specific connections of stress or ABA signaling to epigenetics or RNA processing have been found. By implication, this also points out cases where such mechanistic connections are likely to exist but are yet to be characterized. In the absence of such specific connections to stress signaling, it should be kept in mind that stress sensitivity phenotypes of some epigenetic or RNA processing mutants maybe the result of indirect, pleiotropic effects and thus may perhaps not indicate a direct function in stress acclimation.
Advanced imaging techniques for the study of plant growth and development.
Sozzani, Rosangela; Busch, Wolfgang; Spalding, Edgar P; Benfey, Philip N
2014-05-01
A variety of imaging methodologies are being used to collect data for quantitative studies of plant growth and development from living plants. Multi-level data, from macroscopic to molecular, and from weeks to seconds, can be acquired. Furthermore, advances in parallelized and automated image acquisition enable the throughput to capture images from large populations of plants under specific growth conditions. Image-processing capabilities allow for 3D or 4D reconstruction of image data and automated quantification of biological features. These advances facilitate the integration of imaging data with genome-wide molecular data to enable systems-level modeling. Copyright © 2013 Elsevier Ltd. All rights reserved.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.; Housner, Jerrold M.
1993-01-01
Recent advances in computer technology that are likely to impact structural analysis and design of flight vehicles are reviewed. A brief summary is given of the advances in microelectronics, networking technologies, and in the user-interface hardware and software. The major features of new and projected computing systems, including high performance computers, parallel processing machines, and small systems, are described. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed. The impact of the advances in computer technology on structural analysis and the design of flight vehicles is described. A scenario for future computing paradigms is presented, and the near-term needs in the computational structures area are outlined.
Parallel design of JPEG-LS encoder on graphics processing units
NASA Astrophysics Data System (ADS)
Duan, Hao; Fang, Yong; Huang, Bormin
2012-01-01
With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
A parallel-processing approach to computing for the geographic sciences
Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark
2001-01-01
The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
NASA Technical Reports Server (NTRS)
Long, Junsheng
1994-01-01
This thesis studies a forward recovery strategy using checkpointing and optimistic execution in parallel and distributed systems. The approach uses replicated tasks executing on different processors for forwared recovery and checkpoint comparison for error detection. To reduce overall redundancy, this approach employs a lower static redundancy in the common error-free situation to detect error than the standard N Module Redundancy scheme (NMR) does to mask off errors. For the rare occurrence of an error, this approach uses some extra redundancy for recovery. To reduce the run-time recovery overhead, look-ahead processes are used to advance computation speculatively and a rollback process is used to produce a diagnosis for correct look-ahead processes without rollback of the whole system. Both analytical and experimental evaluation have shown that this strategy can provide a nearly error-free execution time even under faults with a lower average redundancy than NMR.
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
NASA Technical Reports Server (NTRS)
Steele, Gynelle C.
1999-01-01
The NASA Lewis Research Center and Flow Parametrics will enter into an agreement to commercialize the National Combustion Code (NCC). This multidisciplinary combustor design system utilizes computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. This integrated system can facilitate and enhance various phases of the design and analysis process.
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Weighted Ensemble Simulation: Review of Methodology, Applications, and Software.
Zuckerman, Daniel M; Chong, Lillian T
2017-05-22
The weighted ensemble (WE) methodology orchestrates quasi-independent parallel simulations run with intermittent communication that can enhance sampling of rare events such as protein conformational changes, folding, and binding. The WE strategy can achieve superlinear scaling-the unbiased estimation of key observables such as rate constants and equilibrium state populations to greater precision than would be possible with ordinary parallel simulation. WE software can be used to control any dynamics engine, such as standard molecular dynamics and cell-modeling packages. This article reviews the theoretical basis of WE and goes on to describe successful applications to a number of complex biological processes-protein conformational transitions, (un)binding, and assembly processes, as well as cell-scale processes in systems biology. We furthermore discuss the challenges that need to be overcome in the next phase of WE methodological development. Overall, the combined advances in WE methodology and software have enabled the simulation of long-timescale processes that would otherwise not be practical on typical computing resources using standard simulation.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Towards green high capacity optical networks
NASA Astrophysics Data System (ADS)
Glesk, I.; Mohd Warip, M. N.; Idris, S. K.; Osadola, T. B.; Andonovic, I.
2011-09-01
The demand for fast, secure, energy efficient high capacity networks is growing. It is fuelled by transmission bandwidth needs which will support among other things the rapid penetration of multimedia applications empowering smart consumer electronics and E-businesses. All the above trigger unparallel needs for networking solutions which must offer not only high-speed low-cost "on demand" mobile connectivity but should be ecologically friendly and have low carbon footprint. The first answer to address the bandwidth needs was deployment of fibre optic technologies into transport networks. After this it became quickly obvious that the inferior electronic bandwidth (if compared to optical fiber) will further keep its upper hand on maximum implementable serial data rates. A new solution was found by introducing parallelism into data transport in the form of Wavelength Division Multiplexing (WDM) which has helped dramatically to improve aggregate throughput of optical networks. However with these advancements a new bottleneck has emerged at fibre endpoints where data routers must process the incoming and outgoing traffic. Here, even with the massive and power hungry electronic parallelism routers today (still relying upon bandwidth limiting electronics) do not offer needed processing speeds networks demands. In this paper we will discuss some novel unconventional approaches to address network scalability leading to energy savings via advance optical signal processing. We will also investigate energy savings based on advanced network management through nodes hibernation proposed for Optical IP networks. The hibernation reduces the network overall power consumption by forming virtual network reconfigurations through selective nodes groupings and by links segmentations and partitionings.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1983-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
Anthropology and cultural neuroscience: creating productive intersections in parallel fields.
Brown, R A; Seligman, R
2009-01-01
Partly due to the failure of anthropology to productively engage the fields of psychology and neuroscience, investigations in cultural neuroscience have occurred largely without the active involvement of anthropologists or anthropological theory. Dramatic advances in the tools and findings of social neuroscience have emerged in parallel with significant advances in anthropology that connect social and political-economic processes with fine-grained descriptions of individual experience and behavior. We describe four domains of inquiry that follow from these recent developments, and provide suggestions for intersections between anthropological tools - such as social theory, ethnography, and quantitative modeling of cultural models - and cultural neuroscience. These domains are: the sociocultural construction of emotion, status and dominance, the embodiment of social information, and the dual social and biological nature of ritual. Anthropology can help locate unique or interesting populations and phenomena for cultural neuroscience research. Anthropological tools can also help "drill down" to investigate key socialization processes accountable for cross-group differences. Furthermore, anthropological research points at meaningful underlying complexity in assumed relationships between social forces and biological outcomes. Finally, ethnographic knowledge of cultural content can aid with the development of ecologically relevant stimuli for use in experimental protocols.
NASA Technical Reports Server (NTRS)
Ross, Muriel D.
2003-01-01
In a letter to Robert Hooke, written on 5 February, 1675, Isaac Newton wrote "If I have seen further than certain other men it is by standing upon the shoulders of giants." In his context, Newton was referring to the work of Galileo and Kepler, who preceded him. However, every field has its own giants, those men and women who went before us and, often with few tools at their disposal, uncovered the facts that enabled later researchers to advance knowledge in a particular area. This review traces the history of the evolution of views from early giants in the field of vestibular research to modern concepts of vestibular organ organization and function. Emphasis will be placed on the mammalian maculae as peripheral processors of linear accelerations acting on the head. This review shows that early, correct findings were sometimes unfortunately disregarded, impeding later investigations into the structure and function of the vestibular organs. The central themes are that the macular organs are highly complex, dynamic, adaptive, distributed parallel processors of information, and that historical references can help us to understand our own place in advancing knowledge about their complicated structure and functions.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Tailorable advanced blanket insulation using aluminoborosilicate and alumina batting
NASA Technical Reports Server (NTRS)
Calamito, Dominic P.
1989-01-01
Two types of Tailorable Advanced Blanket Insulation (TABI) flat panels for Advanced Space Transportation Systems were produced. Both types consisted of integrally woven, 3-D fluted core having parallel faces and connecting ribs of Nicalon yarns. The triangular cross section flutes of one type was filled with mandrels of processed Ultrafiber (aluminoborosilicate) stitchbonded Nextel 440 fibrous felt, and the second type wall filled with Saffil alumina fibrous felt insulation. Weaving problems were minimal. Insertion of the fragile insulation mandrels into the fabric flutes was improved by using a special insertion tool. An attempt was made to weave fluted core fabrics from Nextel 440 yarns but was unsuccessful because of the yarn's fragility. A small sample was eventually produced by an unorthodox weaving process and then filled with Saffil insulation. The procedures for setting up and weaving the fabrics and preparing and inserting insulation mandrels are discussed. Characterizations of the panels produced are also presented.
Gate tunable parallel double quantum dots in InAs double-nanowire devices
NASA Astrophysics Data System (ADS)
Baba, S.; Matsuo, S.; Kamata, H.; Deacon, R. S.; Oiwa, A.; Li, K.; Jeppesen, S.; Samuelson, L.; Xu, H. Q.; Tarucha, S.
2017-12-01
We report fabrication and characterization of InAs nanowire devices with two closely placed parallel nanowires. The fabrication process we develop includes selective deposition of the nanowires with micron scale alignment onto predefined finger bottom gates using a polymer transfer technique. By tuning the double nanowire with the finger bottom gates, we observed the formation of parallel double quantum dots with one quantum dot in each nanowire bound by the normal metal contact edges. We report the gate tunability of the charge states in individual dots as well as the inter-dot electrostatic coupling. In addition, we fabricate a device with separate normal metal contacts and a common superconducting contact to the two parallel wires and confirm the dot formation in each wire from comparison of the transport properties and a superconducting proximity gap feature for the respective wires. With the fabrication techniques established in this study, devices can be realized for more advanced experiments on Cooper-pair splitting, generation of Parafermions, and so on.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.
Khan, Md Ashfaquzzaman; Herbordt, Martin C
2011-07-20
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Laszlo, Sarah; Plaut, David C
2012-03-01
The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between explicit, computational models and physiological data collected during the performance of cognitive tasks, we developed a PDP model of visual word recognition which simulates key results from the ERP reading literature, while simultaneously being able to successfully perform lexical decision-a benchmark task for reading models. Simulations reveal that the model's success depends on the implementation of several neurally plausible features in its architecture which are sufficiently domain-general to be relevant to cognitive modeling more generally. Copyright © 2011 Elsevier Inc. All rights reserved.
Parallel detection of violations of color constancy
Foster, David H.; Nascimento, Sérgio M. C.; Amano, Kinjiro; Arend, Larry; Linnell, Karina J.; Nieves, Juan Luis; Plet, Sabrina; Foster, Jeffrey S.
2001-01-01
The perceived colors of reflecting surfaces generally remain stable despite changes in the spectrum of the illuminating light. This color constancy can be measured operationally by asking observers to distinguish illuminant changes on a scene from changes in the reflecting properties of the surfaces comprising it. It is shown here that during fast illuminant changes, simultaneous changes in spectral reflectance of one or more surfaces in an array of other surfaces can be readily detected almost independent of the numbers of surfaces, suggesting a preattentive, spatially parallel process. This process, which is perfect over a spatial window delimited by the anatomical fovea, may form an early input to a multistage analysis of surface color, providing the visual system with information about a rapidly changing world in advance of the generation of a more elaborate and stable perceptual representation. PMID:11438751
Proteus: a reconfigurable computational network for computer vision
NASA Astrophysics Data System (ADS)
Haralick, Robert M.; Somani, Arun K.; Wittenbrink, Craig M.; Johnson, Robert; Cooper, Kenneth; Shapiro, Linda G.; Phillips, Ihsin T.; Hwang, Jenq N.; Cheung, William; Yao, Yung H.; Chen, Chung-Ho; Yang, Larry; Daugherty, Brian; Lorbeski, Bob; Loving, Kent; Miller, Tom; Parkins, Larye; Soos, Steven L.
1992-04-01
The Proteus architecture is a highly parallel MIMD, multiple instruction, multiple-data machine, optimized for large granularity tasks such as machine vision and image processing The system can achieve 20 Giga-flops (80 Giga-flops peak). It accepts data via multiple serial links at a rate of up to 640 megabytes/second. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit switched Enhanced Hypercube serial interconnection network for internal data transfers. The system is designed to use 256 to 1,024 RISC processors. The processors use one megabyte external Read/Write Allocating Caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitate fault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, low and high level simulators, and a message passing system for all control needs. Image processing application software includes a variety of point operators neighborhood, operators, convolution, and the mathematical morphology operations of binary and gray scale dilation, erosion, opening, and closing.
Double-heterojunction nanorod light-responsive LEDs for display applications.
Oh, Nuri; Kim, Bong Hoon; Cho, Seong-Yong; Nam, Sooji; Rogers, Steven P; Jiang, Yiran; Flanagan, Joseph C; Zhai, You; Kim, Jae-Hwan; Lee, Jungyup; Yu, Yongjoon; Cho, Youn Kyoung; Hur, Gyum; Zhang, Jieqian; Trefonas, Peter; Rogers, John A; Shim, Moonsub
2017-02-10
Dual-functioning displays, which can simultaneously transmit and receive information and energy through visible light, would enable enhanced user interfaces and device-to-device interactivity. We demonstrate that double heterojunctions designed into colloidal semiconductor nanorods allow both efficient photocurrent generation through a photovoltaic response and electroluminescence within a single device. These dual-functioning, all-solution-processed double-heterojunction nanorod light-responsive light-emitting diodes open feasible routes to a variety of advanced applications, from touchless interactive screens to energy harvesting and scavenging displays and massively parallel display-to-display data communication. Copyright © 2017, American Association for the Advancement of Science.
NASA Technical Reports Server (NTRS)
Delaat, John C.; Merrill, Walter C.
1990-01-01
The objective of the Advanced Detection, Isolation, and Accommodation Program is to improve the overall demonstrated reliability of digital electronic control systems for turbine engines. For this purpose, an algorithm was developed which detects, isolates, and accommodates sensor failures by using analytical redundancy. The performance of this algorithm was evaluated on a real time engine simulation and was demonstrated on a full scale F100 turbofan engine. The real time implementation of the algorithm is described. The implementation used state-of-the-art microprocessor hardware and software, including parallel processing and high order language programming.
Gravitational Waves from Binary Mergers of Subsolar Mass Dark Black Holes
NASA Astrophysics Data System (ADS)
Shandera, Sarah; Jeong, Donghui; Gebhardt, Henry S. Grasshorn
2018-06-01
We explore the possible spectrum of binary mergers of subsolar mass black holes formed out of dark matter particles interacting via a dark electromagnetism. We estimate the properties of these dark black holes by assuming that their formation process is parallel to Population-III star formation, except that dark molecular cooling can yield a smaller opacity limit. We estimate the binary coalescence rates for the Advanced LIGO and Einstein telescope, and find that scenarios compatible with all current constraints could produce dark black holes at rates high enough for detection by Advanced LIGO.
USDA-ARS?s Scientific Manuscript database
Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
CSM parallel structural methods research
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1989-01-01
Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Autonomous onboard optical processor for driving aid
NASA Astrophysics Data System (ADS)
Attia, Mondher; Servel, Alain; Guibert, Laurent
1995-01-01
We take advantage of recent technological advances in the field of ferroelectric liquid crystal silicon back plane optoelectronic devices. These are well suited to perform massively parallel processing tasks. That choice enables the design of low cost vision systems and allows the implementation of an on-board system. We focus on transport applications such as road sign recognition. Preliminary in-car experimental results are presented.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
NASA Technical Reports Server (NTRS)
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Weighted Ensemble Simulation: Review of Methodology, Applications, and Software
Zuckerman, Daniel M.; Chong, Lillian T.
2018-01-01
The weighted ensemble (WE) methodology orchestrates quasi-independent parallel simulations run with intermittent communication that can enhance sampling of rare events such as protein conformational changes, folding, and binding. The WE strategy can achieve superlinear scaling—the unbiased estimation of key observables such as rate constants and equilibrium state populations to greater precision than would be possible with ordinary parallel simulation. WE software can be used to control any dynamics engine, such as standard molecular dynamics and cell-modeling packages. This article reviews the theoretical basis of WE and goes on to describe successful applications to a number of complex biological processes—protein conformational transitions, (un)binding, and assembly processes, as well as cell-scale processes in systems biology. We furthermore discuss the challenges that need to be overcome in the next phase of WE methodological development. Overall, the combined advances in WE methodology and software have enabled the simulation of long-timescale processes that would otherwise not be practical on typical computing resources using standard simulation. PMID:28301772
Particle-In-Cell simulations of high pressure plasmas using graphics processing units
NASA Astrophysics Data System (ADS)
Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter
2009-10-01
Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
Method of synchronizing independent functional unit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Changhoan
A system for synchronizing parallel processing of a plurality of functional processing units (FPU), a first FPU and a first program counter to control timing of a first stream of program instructions issued to the first FPU by advancement of the first program counter; a second FPU and a second program counter to control timing of a second stream of program instructions issued to the second FPU by advancement of the second program counter, the first FPU is in communication with a second FPU to synchronize the issuance of a first stream of program instructions to the second stream ofmore » program instructions and the second FPU is in communication with the first FPU to synchronize the issuance of the second stream program instructions to the first stream of program instructions.« less
Zhang, Litao; Cvijic, Mary Ellen; Lippy, Jonathan; Myslik, James; Brenner, Stephen L; Binnie, Alastair; Houston, John G
2012-07-01
In this paper, we review the key solutions that enabled evolution of the lead optimization screening support process at Bristol-Myers Squibb (BMS) between 2004 and 2009. During this time, technology infrastructure investment and scientific expertise integration laid the foundations to build and tailor lead optimization screening support models across all therapeutic groups at BMS. Together, harnessing advanced screening technology platforms and expanding panel screening strategy led to a paradigm shift at BMS in supporting lead optimization screening capability. Parallel SAR and structure liability relationship (SLR) screening approaches were first and broadly introduced to empower more-rapid and -informed decisions about chemical synthesis strategy and to broaden options for identifying high-quality drug candidates during lead optimization. Copyright © 2012 Elsevier Ltd. All rights reserved.
Method of synchronizing independent functional unit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Changhoan
2017-05-16
A system for synchronizing parallel processing of a plurality of functional processing units (FPU), a first FPU and a first program counter to control timing of a first stream of program instructions issued to the first FPU by advancement of the first program counter; a second FPU and a second program counter to control timing of a second stream of program instructions issued to the second FPU by advancement of the second program counter, the first FPU is in communication with a second FPU to synchronize the issuance of a first stream of program instructions to the second stream ofmore » program instructions and the second FPU is in communication with the first FPU to synchronize the issuance of the second stream program instructions to the first stream of program instructions.« less
Method of synchronizing independent functional unit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Changhoan
2017-02-14
A system for synchronizing parallel processing of a plurality of functional processing units (FPU), a first FPU and a first program counter to control timing of a first stream of program instructions issued to the first FPU by advancement of the first program counter; a second FPU and a second program counter to control timing of a second stream of program instructions issued to the second FPU by advancement of the second program counter, the first FPU is in communication with a second FPU to synchronize the issuance of a first stream of program instructions to the second stream ofmore » program instructions and the second FPU is in communication with the first FPU to synchronize the issuance of the second stream program instructions to the first stream of program instructions.« less
Co-Simulation for Advanced Process Design and Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stephen E. Zitney
2009-01-01
Meeting the increasing demand for clean, affordable, and secure energy is arguably the most important challenge facing the world today. Fossil fuels can play a central role in a portfolio of carbon-neutral energy options provided CO{sub 2} emissions can be dramatically reduced by capturing CO{sub 2} and storing it safely and effectively. Fossil energy industry faces the challenge of meeting aggressive design goals for next-generation power plants with CCS. Process designs will involve large, highly-integrated, and multipurpose systems with advanced equipment items with complex geometries and multiphysics. APECS is enabling software to facilitate effective integration, solution, and analysis of high-fidelitymore » process/equipment (CFD) co-simulations. APECS helps to optimize fluid flow and related phenomena that impact overall power plant performance. APECS offers many advanced capabilities including ROMs, design optimization, parallel execution, stochastic analysis, and virtual plant co-simulations. NETL and its collaborative R&D partners are using APECS to reduce the time, cost, and technical risk of developing high-efficiency, zero-emission power plants with CCS.« less
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven
1994-01-01
The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.
Modulation and coding for satellite and space communications
NASA Technical Reports Server (NTRS)
Yuen, Joseph H.; Simon, Marvin K.; Pollara, Fabrizio; Divsalar, Dariush; Miller, Warner H.; Morakis, James C.; Ryan, Carl R.
1990-01-01
Several modulation and coding advances supported by NASA are summarized. To support long-constraint-length convolutional code, a VLSI maximum-likelihood decoder, utilizing parallel processing techniques, which is being developed to decode convolutional codes of constraint length 15 and a code rate as low as 1/6 is discussed. A VLSI high-speed 8-b Reed-Solomon decoder which is being developed for advanced tracking and data relay satellite (ATDRS) applications is discussed. A 300-Mb/s modem with continuous phase modulation (CPM) and codings which is being developed for ATDRS is discussed. Trellis-coded modulation (TCM) techniques are discussed for satellite-based mobile communication applications.
NASA Technical Reports Server (NTRS)
Welch, J. D.
1975-01-01
The preliminary design of an experiment for landmark recognition and tracking from the Shuttle/Advanced Technology Laboratory is described. It makes use of parallel coherent optical processing to perform correlation tests between landmarks observed passively with a telescope and previously made holographic matched filters. The experimental equipment including the optics, the low power laser, the random access file of matched filters and the electro-optical readout device are described. A real time optically excited liquid crystal device is recommended for performing the input non-coherent optical to coherent optical interface function. A development program leading to a flight experiment in 1981 is outlined.
Advances in simulation of wave interactions with extended MHD phenomena
NASA Astrophysics Data System (ADS)
Batchelor, D.; Abla, G.; D'Azevedo, E.; Bateman, G.; Bernholdt, D. E.; Berry, L.; Bonoli, P.; Bramley, R.; Breslau, J.; Chance, M.; Chen, J.; Choi, M.; Elwasif, W.; Foley, S.; Fu, G.; Harvey, R.; Jaeger, E.; Jardin, S.; Jenkins, T.; Keyes, D.; Klasky, S.; Kruger, S.; Ku, L.; Lynch, V.; McCune, D.; Ramos, J.; Schissel, D.; Schnack, D.; Wright, J.
2009-07-01
The Integrated Plasma Simulator (IPS) provides a framework within which some of the most advanced, massively-parallel fusion modeling codes can be interoperated to provide a detailed picture of the multi-physics processes involved in fusion experiments. The presentation will cover four topics: 1) recent improvements to the IPS, 2) application of the IPS for very high resolution simulations of ITER scenarios, 3) studies of resistive and ideal MHD stability in tokamk discharges using IPS facilities, and 4) the application of RF power in the electron cyclotron range of frequencies to control slowly growing MHD modes in tokamaks and initial evaluations of optimized location for RF power deposition.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation
NASA Technical Reports Server (NTRS)
O'Connell, Matthew; Karman, Steve L.
2015-01-01
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
High Volume Fraction Carbon Nanotube Composites for Aerospace Applications
NASA Technical Reports Server (NTRS)
Siochi, Emilie J.; Kim, Jae-Woo; Sauti, Godfrey; Cano, Roberto J.; Wincheski, Russell A.; Ratcliffe, James G.; Czabaj, Michael; Jensen, Benjamin D.; Wise, Kristopher E.
2015-01-01
Reported nanoscale mechanical properties of carbon nanotubes (CNTs) suggest that their use may enable the fabrication of significantly lighter structures for use in space applications. To be useful in the fabrication of large structures, however, their attractive nanoscale properties must be retained as they are scaled up to bulk materials and converted into practically useful forms. Advances in CNT production have significantly increased the quantities available for use in manufacturing processes, but challenges remain with the retention of nanoscale properties in larger assemblies of CNTs. This work summarizes recent progress in producing carbon nanotube composites with tensile properties approaching those of carbon fiber reinforced polymer composites. These advances were achieved in nanocomposites with CNT content of 70% by weight. The processing methods explored to yield these CNT composite properties will be discussed, as will the characterization and test methods that were developed to provide insight into the factors that contribute to the enhanced tensile properties. Technology maturation was guided by parallel advancements in computational modeling tools that aided in the interpretation of experimental data.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
Extending the surrogacy analogy: applying the advance directive model to biobanks.
Solomon, Stephanie; Mongoven, Ann
2015-01-01
Biobank donors and biobank governance face a conceptual challenge akin to clinical patients and their designated surrogate decision-makers, the necessity of making decisions and policies now that must be implemented under future unknown circumstances. We propose that biobanks take advantage of this parallel to learn lessons from the historical trajectory of advance directives and develop models analogous to current 'best practice' advance directives such as Values Histories and TheFive Wishes. We suggest how such models could improve biobanks' engagement both with communities and with individual donors by being more honest about the limits of current disclosure and eliciting information to ensure the protection of donor interests more robustly through time than current 'informed consent' processes in biobanking. © 2014 S. Karger AG, Basel.
GPU-based acceleration of computations in nonlinear finite element deformation analysis.
Mafi, Ramin; Sirouspour, Shahin
2014-03-01
The physics of deformation for biological soft-tissue is best described by nonlinear continuum mechanics-based models, which then can be discretized by the FEM for a numerical solution. However, computational complexity of such models have limited their use in applications requiring real-time or fast response. In this work, we propose a graphic processing unit-based implementation of the FEM using implicit time integration for dynamic nonlinear deformation analysis. This is the most general formulation of the deformation analysis. It is valid for large deformations and strains and can account for material nonlinearities. The data-parallel nature and the intense arithmetic computations of nonlinear FEM equations make it particularly suitable for implementation on a parallel computing platform such as graphic processing unit. In this work, we present and compare two different designs based on the matrix-free and conventional preconditioned conjugate gradients algorithms for solving the FEM equations arising in deformation analysis. The speedup achieved with the proposed parallel implementations of the algorithms will be instrumental in the development of advanced surgical simulators and medical image registration methods involving soft-tissue deformation. Copyright © 2013 John Wiley & Sons, Ltd.
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu
2013-01-01
Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
Computational methods and software systems for dynamics and control of large space structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Felippa, C. A.; Farhat, C.; Pramono, E.
1990-01-01
This final report on computational methods and software systems for dynamics and control of large space structures covers progress to date, projected developments in the final months of the grant, and conclusions. Pertinent reports and papers that have not appeared in scientific journals (or have not yet appeared in final form) are enclosed. The grant has supported research in two key areas of crucial importance to the computer-based simulation of large space structure. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area, as reported here, involves massively parallel computers.
Successful applications of computer aided drug discovery: moving drugs from concept to the clinic.
Talele, Tanaji T; Khedkar, Santosh A; Rigby, Alan C
2010-01-01
Drug discovery and development is an interdisciplinary, expensive and time-consuming process. Scientific advancements during the past two decades have changed the way pharmaceutical research generate novel bioactive molecules. Advances in computational techniques and in parallel hardware support have enabled in silico methods, and in particular structure-based drug design method, to speed up new target selection through the identification of hits to the optimization of lead compounds in the drug discovery process. This review is focused on the clinical status of experimental drugs that were discovered and/or optimized using computer-aided drug design. We have provided a historical account detailing the development of 12 small molecules (Captopril, Dorzolamide, Saquinavir, Zanamivir, Oseltamivir, Aliskiren, Boceprevir, Nolatrexed, TMI-005, LY-517717, Rupintrivir and NVP-AUY922) that are in clinical trial or have become approved for therapeutic use.
Integrated Task and Data Parallel Programming
NASA Technical Reports Server (NTRS)
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design
NASA Technical Reports Server (NTRS)
Grimshaw, Andrew S.; West, Emily A.
1998-01-01
his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
NASA Astrophysics Data System (ADS)
Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.
2015-12-01
We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A
2017-01-01
The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
The Modeling, Simulation and Comparison of Interconnection Networks for Parallel Processing.
1987-12-01
performs better at a lower hardware cost than do the single stage cube and mesh networks. As a result, the designer of a paralll pro- cessing system is...attempted, and in most cases succeeded, in designing and implementing faster. more powerful systems. Due to design innovations and technological advances...largely to the computational complexity of the algorithms executed. In the von Neumann machine, instructions must be executed in a sequential manner. Design
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Secondary Heat Exchanger Design and Comparison for Advanced High Temperature Reactor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piyush Sabharwall; Ali Siahpush; Michael McKellar
2012-06-01
The goals of next generation nuclear reactors, such as the high temperature gas-cooled reactor and advance high temperature reactor (AHTR), are to increase energy efficiency in the production of electricity and provide high temperature heat for industrial processes. The efficient transfer of energy for industrial applications depends on the ability to incorporate effective heat exchangers between the nuclear heat transport system and the industrial process heat transport system. The need for efficiency, compactness, and safety challenge the boundaries of existing heat exchanger technology, giving rise to the following study. Various studies have been performed in attempts to update the secondarymore » heat exchanger that is downstream of the primary heat exchanger, mostly because its performance is strongly tied to the ability to employ more efficient conversion cycles, such as the Rankine super critical and subcritical cycles. This study considers two different types of heat exchangers—helical coiled heat exchanger and printed circuit heat exchanger—as possible options for the AHTR secondary heat exchangers with the following three different options: (1) A single heat exchanger transfers all the heat (3,400 MW(t)) from the intermediate heat transfer loop to the power conversion system or process plants; (2) Two heat exchangers share heat to transfer total heat of 3,400 MW(t) from the intermediate heat transfer loop to the power conversion system or process plants, each exchanger transfers 1,700 MW(t) with a parallel configuration; and (3) Three heat exchangers share heat to transfer total heat of 3,400 MW(t) from the intermediate heat transfer loop to the power conversion system or process plants. Each heat exchanger transfers 1,130 MW(t) with a parallel configuration. A preliminary cost comparison will be provided for all different cases along with challenges and recommendations.« less
NASA Astrophysics Data System (ADS)
Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.
2011-10-01
SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
NASA Astrophysics Data System (ADS)
Lanari, Riccardo; Bonano, Manuela; Buonanno, Sabatino; Casu, Francesco; De Luca, Claudio; Fusco, Adele; Manunta, Michele; Manzo, Mariarosaria; Pepe, Antonio; Zinno, Ivana
2017-04-01
The SENTINEL-1 (S1) mission is designed to provide operational capability for continuous mapping of the Earth thanks to its two polar-orbiting satellites (SENTINEL-1A and B) performing C-band synthetic aperture radar (SAR) imaging. It is, indeed, characterized by enhanced revisit frequency, coverage and reliability for operational services and applications requiring long SAR data time series. Moreover, SENTINEL-1 is specifically oriented to interferometry applications with stringent requirements based on attitude and orbit accuracy and it is intrinsically characterized by small spatial and temporal baselines. Consequently, SENTINEL-1 data are particularly suitable to be exploited through advanced interferometric techniques such as the well-known DInSAR algorithm referred to as Small BAseline Subset (SBAS), which allows the generation of deformation time series and displacement velocity maps. In this work we present an advanced interferometric processing chain, based on the Parallel SBAS (P-SBAS) approach, for the massive processing of S1 Interferometric Wide Swath (IWS) data aimed at generating deformation time series in efficient, automatic and systematic way. Such a DInSAR chain is designed to exploit distributed computing infrastructures, and more specifically Cloud Computing environments, to properly deal with the storage and the processing of huge S1 datasets. In particular, since S1 IWS data are acquired with the innovative Terrain Observation with Progressive Scans (TOPS) mode, we could benefit from the structure of S1 data, which are composed by bursts that can be considered as separate acquisitions. Indeed, the processing is intrinsically parallelizable with respect to such independent input data and therefore we basically exploited this coarse granularity parallelization strategy in the majority of the steps of the SBAS processing chain. Moreover, we also implemented more sophisticated parallelization approaches, exploiting both multi-node and multi-core programming techniques. Currently, Cloud Computing environments make available large collections of computing resources and storage that can be effectively exploited through the presented S1 P-SBAS processing chain to carry out interferometric analyses at a very large scale, in reduced time. This allows us to deal also with the problems connected to the use of S1 P-SBAS chain in operational contexts, related to hazard monitoring and risk prevention and mitigation, where handling large amounts of data represents a challenging task. As a significant experimental result we performed a large spatial scale SBAS analysis relevant to the Central and Southern Italy by exploiting the Amazon Web Services Cloud Computing platform. In particular, we processed in parallel 300 S1 acquisitions covering the Italian peninsula from Lazio to Sicily through the presented S1 P-SBAS processing chain, generating 710 interferograms, thus finally obtaining the displacement time series of the whole processed area. This work has been partially supported by the CNR-DPC agreement, the H2020 EPOS-IP project (GA 676564) and the ESA GEP project.
PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code.
NASA Astrophysics Data System (ADS)
Chacon, L.; Knoll, D. A.
2004-11-01
We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended primitive-variable MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field, non-dissipative, and stable in the absence of physical dissipation.(L. Chacón , phComput. Phys. Comm.) submitted (2004) PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, first and second-order implicit schemes are available, although higher-order temporal implicit schemes can be effortlessly implemented within the Newton-Krylov framework. A successful, scalable, MG physics-based preconditioning strategy, similar in concept to previous 2D MHD efforts,(L. Chacón et al., phJ. Comput. Phys). 178 (1), 15- 36 (2002); phJ. Comput. Phys., 188 (2), 573-592 (2003) has been developed. We are currently in the process of parallelizing the code using the PETSc library, and a Newton-Krylov-Schwarz approach for the parallel treatment of the preconditioner. In this poster, we will report on both the serial and parallel performance of PIXIE3D, focusing primarily on scalability and CPU speedup vs. an explicit approach.
Parallel ptychographic reconstruction
Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...
2014-12-19
Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less
NeuroSeek dual-color image processing infrared focal plane array
NASA Astrophysics Data System (ADS)
McCarley, Paul L.; Massie, Mark A.; Baxter, Christopher R.; Huynh, Buu L.
1998-09-01
Several technologies have been developed in recent years to advance the state of the art of IR sensor systems including dual color affordable focal planes, on-focal plane array biologically inspired image and signal processing techniques and spectral sensing techniques. Pacific Advanced Technology (PAT) and the Air Force Research Lab Munitions Directorate have developed a system which incorporates the best of these capabilities into a single device. The 'NeuroSeek' device integrates these technologies into an IR focal plane array (FPA) which combines multicolor Midwave IR/Longwave IR radiometric response with on-focal plane 'smart' neuromorphic analog image processing. The readout and processing integrated circuit very large scale integration chip which was developed under this effort will be hybridized to a dual color detector array to produce the NeuroSeek FPA, which will have the capability to fuse multiple pixel-based sensor inputs directly on the focal plane. Great advantages are afforded by application of massively parallel processing algorithms to image data in the analog domain; the high speed and low power consumption of this device mimic operations performed in the human retina.
Perceptual and neural responses to sweet taste in humans and rodents.
Lemon, Christian H
2015-08-01
This mini-review discusses some of the parallels between rodent neurophysiological and human psychophysical data concerning temperature effects on sweet taste. "Sweet" is an innately rewarding taste sensation that is associated in part with foods that contain calories in the form of sugars. Humans and other mammals can show unconditioned preference for select sweet stimuli. Such preference is poised to influence diet selection and, in turn, nutritional status, which underscores the importance of delineating the physiological mechanisms for sweet taste with respect to their influence on human health. Advances in our knowledge of the biology of sweet taste in humans have arisen in part through studies on mechanisms of gustatory processing in rodent models. Along this line, recent work has revealed there are operational parallels in neural systems for sweet taste between mice and humans, as indexed by similarities in the effects of temperature on central neurophysiological and psychophysical responses to sucrose in these species. Such association strengthens the postulate that rodents can serve as effective models of particular mechanisms of appetitive taste processing. Data supporting this link are discussed here, as are rodent and human data that shed light on relationships between mechanisms for sweet taste and ingestive disorders, such as alcohol abuse. Rodent models have utility for understanding mechanisms of taste processing that may pertain to human flavor perception. Importantly, there are limitations to generalizing data from rodents, albeit parallels across species do exist.
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics
NASA Astrophysics Data System (ADS)
Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.
2017-05-01
We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Application of computational physics within Northrop
NASA Technical Reports Server (NTRS)
George, M. W.; Ling, R. T.; Mangus, J. F.; Thompkins, W. T.
1987-01-01
An overview of Northrop programs in computational physics is presented. These programs depend on access to today's supercomputers, such as the Numerical Aerodynamical Simulator (NAS), and future growth on the continuing evolution of computational engines. Descriptions here are concentrated on the following areas: computational fluid dynamics (CFD), computational electromagnetics (CEM), computer architectures, and expert systems. Current efforts and future directions in these areas are presented. The impact of advances in the CFD area is described, and parallels are drawn to analagous developments in CEM. The relationship between advances in these areas and the development of advances (parallel) architectures and expert systems is also presented.
NASA Technical Reports Server (NTRS)
Noor, Ahmed K. (Editor)
1986-01-01
The papers contained in this volume provide an overview of the advances made in a number of aspects of computational mechanics, identify some of the anticipated industry needs in this area, discuss the opportunities provided by new hardware and parallel algorithms, and outline some of the current government programs in computational mechanics. Papers are included on advances and trends in parallel algorithms, supercomputers for engineering analysis, material modeling in nonlinear finite-element analysis, the Navier-Stokes computer, and future finite-element software systems.
Ultrasonic and radiographic evaluation of advanced aerospace materials: Ceramic composites
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
1990-01-01
Two conventional nondestructive evaluation techniques were used to evaluate advanced ceramic composite materials. It was shown that neither ultrasonic C-scan nor radiographic imaging can individually provide sufficient data for an accurate nondestructive evaluation. Both ultrasonic C-scan and conventional radiographic imaging are required for preliminary evaluation of these complex systems. The material variations that were identified by these two techniques are porosity, delaminations, bond quality between laminae, fiber alignment, fiber registration, fiber parallelism, and processing density flaws. The degree of bonding between fiber and matrix cannot be determined by either of these methods. An alternative ultrasonic technique, angular power spectrum scanning (APSS) is recommended for quantification of this interfacial bond.
Climbing with adhesion: from bioinspiration to biounderstanding
Cutkosky, Mark R.
2015-01-01
Bioinspiration is an increasingly popular design paradigm, especially as robots venture out of the laboratory and into the world. Animals are adept at coping with the variability that the world imposes. With advances in scientific tools for understanding biological structures in detail, we are increasingly able to identify design features that account for animals' robust performance. In parallel, advances in fabrication methods and materials are allowing us to engineer artificial structures with similar properties. The resulting robots become useful platforms for testing hypotheses about which principles are most important. Taking gecko-inspired climbing as an example, we show that the process of extracting principles from animals and adapting them to robots provides insights for both robotics and biology. PMID:26464786
NASA Technical Reports Server (NTRS)
Dorney, Suzanne; Dorney, Daniel J.; Huber, Frank; Sheffler, David A.; Turner, James E. (Technical Monitor)
2001-01-01
The advent of advanced computer architectures and parallel computing have led to a revolutionary change in the design process for turbomachinery components. Two- and three-dimensional steady-state computational flow procedures are now routinely used in the early stages of design. Unsteady flow analyses, however, are just beginning to be incorporated into design systems. This paper outlines the transition of a three-dimensional unsteady viscous flow analysis from the research environment into the design environment. The test case used to demonstrate the analysis is the full turbine system (high-pressure turbine, inter-turbine duct and low-pressure turbine) from an advanced turboprop engine.
[Comparison study between biological vision and computer vision].
Liu, W; Yuan, X G; Yang, C X; Liu, Z Q; Wang, R
2001-08-01
The development and bearing of biology vision in structure and mechanism were discussed, especially on the aspects including anatomical structure of biological vision, tentative classification of reception field, parallel processing of visual information, feedback and conformity effect of visual cortical, and so on. The new advance in the field was introduced through the study of the morphology of biological vision. Besides, comparison between biological vision and computer vision was made, and their similarities and differences were pointed out.
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide
NASA Astrophysics Data System (ADS)
Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.
Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Photonics for aerospace sensors
NASA Astrophysics Data System (ADS)
Pellegrino, John; Adler, Eric D.; Filipov, Andree N.; Harrison, Lorna J.; van der Gracht, Joseph; Smith, Dale J.; Tayag, Tristan J.; Viveiros, Edward A.
1992-11-01
The maturation in the state-of-the-art of optical components is enabling increased applications for the technology. Most notable is the ever-expanding market for fiber optic data and communications links, familiar in both commercial and military markets. The inherent properties of optics and photonics, however, have suggested that components and processors may be designed that offer advantages over more commonly considered digital approaches for a variety of airborne sensor and signal processing applications. Various academic, industrial, and governmental research groups have been actively investigating and exploiting these properties of high bandwidth, large degree of parallelism in computation (e.g., processing in parallel over a two-dimensional field), and interconnectivity, and have succeeded in advancing the technology to the stage of systems demonstration. Such advantages as computational throughput and low operating power consumption are highly attractive for many computationally intensive problems. This review covers the key devices necessary for optical signal and image processors, some of the system application demonstration programs currently in progress, and active research directions for the implementation of next-generation architectures.
Recent Advances in Techniques for Hyperspectral Image Processing
NASA Technical Reports Server (NTRS)
Plaza, Antonio; Benediktsson, Jon Atli; Boardman, Joseph W.; Brazile, Jason; Bruzzone, Lorenzo; Camps-Valls, Gustavo; Chanussot, Jocelyn; Fauvel, Mathieu; Gamba, Paolo; Gualtieri, Anthony;
2009-01-01
Imaging spectroscopy, also known as hyperspectral imaging, has been transformed in less than 30 years from being a sparse research tool into a commodity product available to a broad user community. Currently, there is a need for standardized data processing techniques able to take into account the special properties of hyperspectral data. In this paper, we provide a seminal view on recent advances in techniques for hyperspectral image processing. Our main focus is on the design of techniques able to deal with the highdimensional nature of the data, and to integrate the spatial and spectral information. Performance of the discussed techniques is evaluated in different analysis scenarios. To satisfy time-critical constraints in specific applications, we also develop efficient parallel implementations of some of the discussed algorithms. Combined, these parts provide an excellent snapshot of the state-of-the-art in those areas, and offer a thoughtful perspective on future potentials and emerging challenges in the design of robust hyperspectral imaging algorithms
A review of aircraft turnaround operations and simulations
NASA Astrophysics Data System (ADS)
Schmidt, Michael
2017-07-01
The ground operational processes are the connecting element between aircraft en-route operations and airport infrastructure. An efficient aircraft turnaround is an essential component of airline success, especially for regional and short-haul operations. It is imperative that advancements in ground operations, specifically process reliability and passenger comfort, are developed while dealing with increasing passenger traffic in the next years. This paper provides an introduction to aircraft ground operations focusing on the aircraft turnaround and passenger processes. Furthermore, key challenges for current aircraft operators, such as airport capacity constraints, schedule disruptions and the increasing cost pressure, are highlighted. A review of the conducted studies and conceptual work in this field shows pathways for potential process improvements. Promising approaches attempt to reduce apron traffic and parallelize passenger processes and taxiing. The application of boarding strategies and novel cabin layouts focusing on aisle, door and seat, are options to shorten the boarding process inside the cabin. A summary of existing modeling and simulation frameworks give an insight into state-of-the-art assessment capabilities as it concerns advanced concepts. They are the prerequisite to allow a holistic assessment during the early stages of the preliminary aircraft design process and to identify benefits and drawbacks for all involved stakeholders.
Advancing MODFLOW Applying the Derived Vector Space Method
NASA Astrophysics Data System (ADS)
Herrera, G. S.; Herrera, I.; Lemus-García, M.; Hernandez-Garcia, G. D.
2015-12-01
The most effective domain decomposition methods (DDM) are non-overlapping DDMs. Recently a new approach, the DVS-framework, based on an innovative discretization method that uses a non-overlapping system of nodes (the derived-nodes), was introduced and developed by I. Herrera et al. [1, 2]. Using the DVS-approach a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm (i.e. the solution of global problems is obtained by resolution of local problems exclusively) has been derived. Such procedures are applicable to any boundary-value problem, or system of such equations, for which a standard discretization method is available and then software with a high degree of parallelization can be constructed. In a parallel talk, in this AGU Fall Meeting, Ismael Herrera will introduce the general DVS methodology. The application of the DVS-algorithms has been demonstrated in the solution of several boundary values problems of interest in Geophysics. Numerical examples for a single-equation, for the cases of symmetric, non-symmetric and indefinite problems were demonstrated before [1,2]. For these problems DVS-algorithms exhibited significantly improved numerical performance with respect to standard versions of DDM algorithms. In view of these results our research group is in the process of applying the DVS method to a widely used simulator for the first time, here we present the advances of the application of this method for the parallelization of MODFLOW. Efficiency results for a group of tests will be presented. References [1] I. Herrera, L.M. de la Cruz and A. Rosas-Medina. Non overlapping discretization methods for partial differential equations, Numer Meth Part D E, (2013). [2] Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Gordon Research Conference on Nonlinear Optics and Lasers
NASA Astrophysics Data System (ADS)
Haus, Hermann
1992-02-01
The topics chosen were production of X rays with high power lasers, generation of millimeter waves with femtosecond pulses, microcavities and microlasers, second harmonic generation in fibers and advances in photorefractivity and parallel optical processing. It introduces ways of thinking and scientific methods in fields that are related, but would not generally appear in specialized conferences. There were three such examples: the methods of nonlinear optics as applied to electronic signal processing, the concept of squeezing (special quantum states of the electromagnetic field) as used to explain the generation of gravitational waves in the expanding universe, and particle interferometers with particle- instead of wave-gratings. By asking Nobel laureate Bloembergen one year in advance to give the traditional after dinner speech, we were privileged to hear him speak of the history of optics over the centuries resulting in the various principles of linear optics, and the highly accelerated pace of discovery of the analogous principles in nonlinear optics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.
This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less
Experimental determination of pCo perturbation factors for plane-parallel chambers
NASA Astrophysics Data System (ADS)
Kapsch, R. P.; Bruggmoser, G.; Christ, G.; Dohm, O. S.; Hartmann, G. H.; Schüle, E.
2007-12-01
For plane-parallel chambers used in electron dosimetry, modern dosimetry protocols recommend a cross-calibration against a calibrated cylindrical chamber. The rationale for this is the unacceptably large (up to 3-4%) chamber-to-chamber variations of the perturbation factors (pwall)Co, which have been reported for plane-parallel chambers of a given type. In some recent publications, it was shown that this is no longer the case for modern plane-parallel chambers. The aims of the present study are to obtain reliable information about the variation of the perturbation factors for modern types of plane-parallel chambers, and—if this variation is found to be acceptably small—to determine type-specific mean values for these perturbation factors which can be used for absorbed dose measurements in electron beams using plane-parallel chambers. In an extensive multi-center study, the individual perturbation factors pCo (which are usually assumed to be equal to (pwall)Co) for a total of 35 plane-parallel chambers of the Roos type, 15 chambers of the Markus type and 12 chambers of the Advanced Markus type were determined. From a total of 188 cross-calibration measurements, variations of the pCo values for different chambers of the same type of at most 1.0%, 0.9% and 0.6% were found for the chambers of the Roos, Markus and Advanced Markus types, respectively. The mean pCo values obtained from all measurements are \\bar{p}^Roos_Co = 1.0198, \\bar{p}^Markus_Co = 1.0175 and \\bar{p}^Advanced_Co = 1.0155 ; the relative experimental standard deviation of the individual pCo values is less than 0.24% for all chamber types; the relative standard uncertainty of the mean pCo values is 1.1%.
High data volume and transfer rate techniques used at NASA's image processing facility
NASA Technical Reports Server (NTRS)
Heffner, P.; Connell, E.; Mccaleb, F.
1978-01-01
Data storage and transfer operations at a new image processing facility are described. The equipment includes high density digital magnetic tape drives and specially designed controllers to provide an interface between the tape drives and computerized image processing systems. The controller performs the functions necessary to convert the continuous serial data stream from the tape drive to a word-parallel blocked data stream which then goes to the computer-based system. With regard to the tape packing density, 1.8 times 10 to the tenth data bits are stored on a reel of one-inch tape. System components and their operation are surveyed, and studies on advanced storage techniques are summarized.
Advanced digital SAR processing study
NASA Technical Reports Server (NTRS)
Martinson, L. W.; Gaffney, B. P.; Liu, B.; Perry, R. P.; Ruvin, A.
1982-01-01
A highly programmable, land based, real time synthetic aperture radar (SAR) processor requiring a processed pixel rate of 2.75 MHz or more in a four look system was designed. Variations in range and azimuth compression, number of looks, range swath, range migration and SR mode were specified. Alternative range and azimuth processing algorithms were examined in conjunction with projected integrated circuit, digital architecture, and software technologies. The advaced digital SAR processor (ADSP) employs an FFT convolver algorithm for both range and azimuth processing in a parallel architecture configuration. Algorithm performace comparisons, design system design, implementation tradeoffs and the results of a supporting survey of integrated circuit and digital architecture technologies are reported. Cost tradeoffs and projections with alternate implementation plans are presented.
Advanced fabrication of Si nanowire FET structures by means of a parallel approach.
Li, J; Pud, S; Mayer, D; Vitusevich, S
2014-07-11
In this paper we present fabricated Si nanowires (NWs) of different dimensions with enhanced electrical characteristics. The parallel fabrication process is based on nanoimprint lithography using high-quality molds, which facilitates the realization of 50 nm-wide NW field-effect transistors (FETs). The imprint molds were fabricated by using a wet chemical anisotropic etching process. The wet chemical etch results in well-defined vertical sidewalls with edge roughness (3σ) as small as 2 nm, which is about four times better compared with the roughness usually obtained for reactive-ion etching molds. The quality of the mold was studied using atomic force microscopy and scanning electron microscopy image data. The use of the high-quality mold leads to almost 100% yield during fabrication of Si NW FETs as well as to an exceptional quality of the surfaces of the devices produced. To characterize the Si NW FETs, we used noise spectroscopy as a powerful method for evaluating device performance and the reliability of structures with nanoscale dimensions. The Hooge parameter of fabricated FET structures exhibits an average value of 1.6 × 10(-3). This value reflects the high quality of Si NW FETs fabricated by means of a parallel approach that uses a nanoimprint mold and cost-efficient technology.
Design Considerations of a Virtual Laboratory for Advanced X-ray Sources
NASA Astrophysics Data System (ADS)
Luginsland, J. W.; Frese, M. H.; Frese, S. D.; Watrous, J. J.; Heileman, G. L.
2004-11-01
The field of scientific computation has greatly advanced in the last few years, resulting in the ability to perform complex computer simulations that can predict the performance of real-world experiments in a number of fields of study. Among the forces driving this new computational capability is the advent of parallel algorithms, allowing calculations in three-dimensional space with realistic time scales. Electromagnetic radiation sources driven by high-voltage, high-current electron beams offer an area to further push the state-of-the-art in high fidelity, first-principles simulation tools. The physics of these x-ray sources combine kinetic plasma physics (electron beams) with dense fluid-like plasma physics (anode plasmas) and x-ray generation (bremsstrahlung). There are a number of mature techniques and software packages for dealing with the individual aspects of these sources, such as Particle-In-Cell (PIC), Magneto-Hydrodynamics (MHD), and radiation transport codes. The current effort is focused on developing an object-oriented software environment using the Rational© Unified Process and the Unified Modeling Language (UML) to provide a framework where multiple 3D parallel physics packages, such as a PIC code (ICEPIC), a MHD code (MACH), and a x-ray transport code (ITS) can co-exist in a system-of-systems approach to modeling advanced x-ray sources. Initial software design and assessments of the various physics algorithms' fidelity will be presented.
Evolution and Advances in Satellite Analysis of Volcanoes
NASA Astrophysics Data System (ADS)
Dean, K. G.; Dehn, J.; Webley, P.; Bailey, J.
2008-12-01
Over the past 20 years satellite data used for monitoring and analysis of volcanic eruptions has evolved in terms of timeliness, access, distribution, resolution and understanding of volcanic processes. Initially satellite data was used for retrospective analysis but has evolved to proactive monitoring systems. Timely acquisition of data and the capability to distribute large data files paralleled advances in computer technology and was a critical component for near real-time monitoring. The sharing of these data and resulting discussions has improved our understanding of eruption processes and, even more importantly, their impact on society. To illustrate this evolution, critical scientific discoveries will be highlighted, including detection of airborne ash and sulfur dioxide, cloud-height estimates, prediction of ash cloud movement, and detection of thermal anomalies as precursor-signals to eruptions. AVO has been a leader in implementing many of these advances into an operational setting such as, automated eruption detection, database analysis systems, and remotely accessible web-based analysis systems. Finally, limitations resulting from trade-offs between resolution and how they impact some weakness in detection techniques and hazard assessments will be presented.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project
NASA Technical Reports Server (NTRS)
Woo, Alex C.; Hill, Kueichien C.
1996-01-01
The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Extending HPF for advanced data parallel applications
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1994-01-01
The stated goal of High Performance Fortran (HPF) was to 'address the problems of writing data parallel programs where the distribution of data affects performance'. After examining the current version of the language we are led to the conclusion that HPF has not fully achieved this goal. While the basic distribution functions offered by the language - regular block, cyclic, and block cyclic distributions - can support regular numerical algorithms, advanced applications such as particle-in-cell codes or unstructured mesh solvers cannot be expressed adequately. We believe that this is a major weakness of HPF, significantly reducing its chances of becoming accepted in the numeric community. The paper discusses the data distribution and alignment issues in detail, points out some flaws in the basic language, and outlines possible future paths of development. Furthermore, we briefly deal with the issue of task parallelism and its integration with the data parallel paradigm of HPF.
Parallelization of the preconditioned IDR solver for modern multicore computer systems
NASA Astrophysics Data System (ADS)
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Partitioning in parallel processing of production systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oflazer, K.
1987-01-01
This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
NASA Astrophysics Data System (ADS)
Sewell, Stephen
This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
Using Serial and Discrete Digit Naming to Unravel Word Reading Processes
Altani, Angeliki; Protopapas, Athanassios; Georgiou, George K.
2018-01-01
During reading acquisition, word recognition is assumed to undergo a developmental shift from slow serial/sublexical processing of letter strings to fast parallel processing of whole word forms. This shift has been proposed to be detected by examining the size of the relationship between serial- and discrete-trial versions of word reading and rapid naming tasks. Specifically, a strong association between serial naming of symbols and single word reading suggests that words are processed serially, whereas a strong association between discrete naming of symbols and single word reading suggests that words are processed in parallel as wholes. In this study, 429 Grade 1, 3, and 5 English-speaking Canadian children were tested on serial and discrete digit naming and word reading. Across grades, single word reading was more strongly associated with discrete naming than with serial naming of digits, indicating that short high-frequency words are processed as whole units early in the development of reading ability in English. In contrast, serial naming was not a unique predictor of single word reading across grades, suggesting that within-word sequential processing was not required for the successful recognition for this set of words. Factor mixture analysis revealed that our participants could be clustered into two classes, namely beginning and more advanced readers. Serial naming uniquely predicted single word reading only among the first class of readers, indicating that novice readers rely on a serial strategy to decode words. Yet, a considerable proportion of Grade 1 students were assigned to the second class, evidently being able to process short high-frequency words as unitized symbols. We consider these findings together with those from previous studies to challenge the hypothesis of a binary distinction between serial/sublexical and parallel/lexical processing in word reading. We argue instead that sequential processing in word reading operates on a continuum, depending on the level of reading proficiency, the degree of orthographic transparency, and word-specific characteristics. PMID:29706918
Using Serial and Discrete Digit Naming to Unravel Word Reading Processes.
Altani, Angeliki; Protopapas, Athanassios; Georgiou, George K
2018-01-01
During reading acquisition, word recognition is assumed to undergo a developmental shift from slow serial/sublexical processing of letter strings to fast parallel processing of whole word forms. This shift has been proposed to be detected by examining the size of the relationship between serial- and discrete-trial versions of word reading and rapid naming tasks. Specifically, a strong association between serial naming of symbols and single word reading suggests that words are processed serially, whereas a strong association between discrete naming of symbols and single word reading suggests that words are processed in parallel as wholes. In this study, 429 Grade 1, 3, and 5 English-speaking Canadian children were tested on serial and discrete digit naming and word reading. Across grades, single word reading was more strongly associated with discrete naming than with serial naming of digits, indicating that short high-frequency words are processed as whole units early in the development of reading ability in English. In contrast, serial naming was not a unique predictor of single word reading across grades, suggesting that within-word sequential processing was not required for the successful recognition for this set of words. Factor mixture analysis revealed that our participants could be clustered into two classes, namely beginning and more advanced readers. Serial naming uniquely predicted single word reading only among the first class of readers, indicating that novice readers rely on a serial strategy to decode words. Yet, a considerable proportion of Grade 1 students were assigned to the second class, evidently being able to process short high-frequency words as unitized symbols. We consider these findings together with those from previous studies to challenge the hypothesis of a binary distinction between serial/sublexical and parallel/lexical processing in word reading. We argue instead that sequential processing in word reading operates on a continuum, depending on the level of reading proficiency, the degree of orthographic transparency, and word-specific characteristics.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Rizk, Yehia M.
1999-01-01
The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
Climate-associated phenological advances in bee pollinators and bee-pollinated plants
Bartomeus, Ignasi; Ascher, John S.; Wagner, David; Danforth, Bryan N.; Colla, Sheila; Kornbluth, Sarah; Winfree, Rachael
2011-01-01
The phenology of many ecological processes is modulated by temperature, making them potentially sensitive to climate change. Mutualistic interactions may be especially vulnerable because of the potential for phenological mismatching if the species involved do not respond similarly to changes in temperature. Here we present an analysis of climate-associated shifts in the phenology of wild bees, the most important pollinators worldwide, and compare these shifts to published studies of bee-pollinated plants over the same time period. We report that over the past 130 y, the phenology of 10 bee species from northeastern North America has advanced by a mean of 10.4 ± 1.3 d. Most of this advance has taken place since 1970, paralleling global temperature increases. When the best available data are used to estimate analogous rates of advance for plants, these rates are not distinguishable from those of bees, suggesting that bee emergence is keeping pace with shifts in host-plant flowering, at least among the generalist species that we investigated. PMID:22143794
Climate-associated phenological advances in bee pollinators and bee-pollinated plants.
Bartomeus, Ignasi; Ascher, John S; Wagner, David; Danforth, Bryan N; Colla, Sheila; Kornbluth, Sarah; Winfree, Rachael
2011-12-20
The phenology of many ecological processes is modulated by temperature, making them potentially sensitive to climate change. Mutualistic interactions may be especially vulnerable because of the potential for phenological mismatching if the species involved do not respond similarly to changes in temperature. Here we present an analysis of climate-associated shifts in the phenology of wild bees, the most important pollinators worldwide, and compare these shifts to published studies of bee-pollinated plants over the same time period. We report that over the past 130 y, the phenology of 10 bee species from northeastern North America has advanced by a mean of 10.4 ± 1.3 d. Most of this advance has taken place since 1970, paralleling global temperature increases. When the best available data are used to estimate analogous rates of advance for plants, these rates are not distinguishable from those of bees, suggesting that bee emergence is keeping pace with shifts in host-plant flowering, at least among the generalist species that we investigated.
Recent advances in the neurophysiology of chronic pain.
Baker, Kylie
2005-02-01
The chronic pain syndrome patient has become the 'leper' of emergency medicine. There are no emergency medicine guidelines and minimal research into managing this challenging group of patients. To summarize the recent advances in laboratory research into the development of chronic pain that have relevance to emergency management. When the level of supporting evidence is low, it is imperative that emergency physicians understand the physiology that underpins those expert opinions upon which they base their treatment strategies. Literature was searched via Medline, Cochrane, Cinahl, and PsycINFO from 1996 to 2004, under 'chronic pain and emergency management'. Medline from 1996 was searched for 'chronic pain and prevention', 'chronic pain and emergency' and 'chronic pain'. Bibliographies were manually searched for older keynote articles. Advances in understanding the biochemical changes of chronic pain are paralleled by lesser known advances in delineation of the corticol processing. Drug manipulation causes complex action and reaction in chronic pain. Emergency physicians must also optimize cognitive and behavioural aspects of treatment to successfully manage this systemic disease.
Advances in Predictive Toxicology for Discovery Safety through High Content Screening.
Persson, Mikael; Hornberg, Jorrit J
2016-12-19
High content screening enables parallel acquisition of multiple molecular and cellular readouts. In particular the predictive toxicology field has progressed from the advances in high content screening, as more refined end points that report on cellular health can be studied in combination, at the single cell level, and in relatively high throughput. Here, we discuss how high content screening has become an essential tool for Discovery Safety, the discipline that integrates safety and toxicology in the drug discovery process to identify and mitigate safety concerns with the aim to design drug candidates with a superior safety profile. In addition to customized mechanistic assays to evaluate target safety, routine screening assays can be applied to identify risk factors for frequently occurring organ toxicities. We discuss the current state of high content screening assays for hepatotoxicity, cardiotoxicity, neurotoxicity, nephrotoxicity, and genotoxicity, including recent developments and current advances.
Unassigned MS/MS Spectra: Who Am I?
Pathan, Mohashin; Samuel, Monisha; Keerthikumar, Shivakumar; Mathivanan, Suresh
2017-01-01
Recent advances in high resolution tandem mass spectrometry (MS) has resulted in the accumulation of high quality data. Paralleled with these advances in instrumentation, bioinformatics software have been developed to analyze such quality datasets. In spite of these advances, data analysis in mass spectrometry still remains critical for protein identification. In addition, the complexity of the generated MS/MS spectra, unpredictable nature of peptide fragmentation, sequence annotation errors, and posttranslational modifications has impeded the protein identification process. In a typical MS data analysis, about 60 % of the MS/MS spectra remains unassigned. While some of these could attribute to the low quality of the MS/MS spectra, a proportion can be classified as high quality. Further analysis may reveal how much of the unassigned MS spectra attribute to search space, sequence annotation errors, mutations, and/or posttranslational modifications. In this chapter, the tools used to identify proteins and ways to assign unassigned tandem MS spectra are discussed.
Sachse, Silke; Beshel, Jennifer
2016-10-01
All animals must eat in order to survive but first they must successfully locate and appraise food resources in a manner consonant with their needs. To accomplish this, external sensory information, in particular olfactory food cues, need to be detected and appropriately categorized. Recent advances in Drosophila point to the existence of parallel processing circuits within the central brain that encode odor valence, supporting approach and avoidance behaviors. Strikingly, many elements within these neural systems are subject to modification as a function of the fly's satiety state. In this review we describe those advances and their potential impact on the decision to feed. Copyright © 2016 Elsevier Ltd. All rights reserved.
A high throughput array microscope for the mechanical characterization of biomaterials
NASA Astrophysics Data System (ADS)
Cribb, Jeremy; Osborne, Lukas D.; Hsiao, Joe Ping-Lin; Vicci, Leandra; Meshram, Alok; O'Brien, E. Tim; Spero, Richard Chasen; Taylor, Russell; Superfine, Richard
2015-02-01
In the last decade, the emergence of high throughput screening has enabled the development of novel drug therapies and elucidated many complex cellular processes. Concurrently, the mechanobiology community has developed tools and methods to show that the dysregulation of biophysical properties and the biochemical mechanisms controlling those properties contribute significantly to many human diseases. Despite these advances, a complete understanding of the connection between biomechanics and disease will require advances in instrumentation that enable parallelized, high throughput assays capable of probing complex signaling pathways, studying biology in physiologically relevant conditions, and capturing specimen and mechanical heterogeneity. Traditional biophysical instruments are unable to meet this need. To address the challenge of large-scale, parallelized biophysical measurements, we have developed an automated array high-throughput microscope system that utilizes passive microbead diffusion to characterize mechanical properties of biomaterials. The instrument is capable of acquiring data on twelve-channels simultaneously, where each channel in the system can independently drive two-channel fluorescence imaging at up to 50 frames per second. We employ this system to measure the concentration-dependent apparent viscosity of hyaluronan, an essential polymer found in connective tissue and whose expression has been implicated in cancer progression.
Parallel Architectures for Planetary Exploration Requirements (PAPER)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet; Sen, Ranjan K.
1989-01-01
The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified.
[Pathogenesis of apical periodontitis and its effects on the body].
Márton, I; Bágyi, K; Radics, T; Kiss, C
1998-01-01
During the last 25 years there have been major advances in understanding the etiology, pathogenesis and maintenance of inflammatory processes taking place in the periapical space. Polymicrobial infection of the pulp chamber is of primary importance in initiating periapical inflammation. Egress of bacteria and their antigenes stimulate the immune system to form a granulation tissue around the apical area. Local immune response eliminates excess number of invading organisms. However, in parallel with protective reactions, local activity of immunocompetent cells and their soluble products also contribute to tissue damage, bone resorption and perpetuation of inflammation. Present data indicate that interaction of T-lymphocytes and macrophages is crucial in this process.
Shared direct memory access on the Explorer 2-LX
NASA Technical Reports Server (NTRS)
Musgrave, Jeffrey L.
1990-01-01
Advances in Expert System technology and Artificial Intelligence have provided a framework for applying automated Intelligence to the solution of problems which were generally perceived as intractable using more classical approaches. As a result, hybrid architectures and parallel processing capability have become more common in computing environments. The Texas Instruments Explorer II-LX is an example of a machine which combines a symbolic processing environment, and a computationally oriented environment in a single chassis for integrated problem solutions. This user's manual is an attempt to make these capabilities more accessible to a wider range of engineers and programmers with problems well suited to solution in such an environment.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.
Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar
2017-03-01
This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amadio, G.; et al.
An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR
NASA Astrophysics Data System (ADS)
Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen
2013-03-01
Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
Serial Back-Plane Technologies in Advanced Avionics Architectures
NASA Technical Reports Server (NTRS)
Varnavas, Kosta
2005-01-01
Current back plane technologies such as VME, and current personal computer back planes such as PCI, are shared bus systems that can exhibit nondeterministic latencies. This means a card can take control of the bus and use resources indefinitely affecting the ability of other cards in the back plane to acquire the bus. This provides a real hit on the reliability of the system. Additionally, these parallel busses only have bandwidths in the 100s of megahertz range and EMI and noise effects get worse the higher the bandwidth goes. To provide scalable, fault-tolerant, advanced computing systems, more applicable to today s connected computing environment and to better meet the needs of future requirements for advanced space instruments and vehicles, serial back-plane technologies should be implemented in advanced avionics architectures. Serial backplane technologies eliminate the problem of one card getting the bus and never relinquishing it, or one minor problem on the backplane bringing the whole system down. Being serial instead of parallel improves the reliability by reducing many of the signal integrity issues associated with parallel back planes and thus significantly improves reliability. The increased speeds associated with a serial backplane are an added bonus.
High-performance computing in image registration
NASA Astrophysics Data System (ADS)
Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro
2012-10-01
Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
Seeing the forest for the trees: Networked workstations as a parallel processing computer
NASA Technical Reports Server (NTRS)
Breen, J. O.; Meleedy, D. M.
1992-01-01
Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
NASA Technical Reports Server (NTRS)
Raju, M. S.
1998-01-01
The state of the art in multidimensional combustor modeling as evidenced by the level of sophistication employed in terms of modeling and numerical accuracy considerations, is also dictated by the available computer memory and turnaround times afforded by present-day computers. With the aim of advancing the current multi-dimensional computational tools used in the design of advanced technology combustors, a solution procedure is developed that combines the novelty of the coupled CFD/spray/scalar Monte Carlo PDF (Probability Density Function) computations on unstructured grids with the ability to run on parallel architectures. In this approach, the mean gas-phase velocity and turbulence fields are determined from a standard turbulence model, the joint composition of species and enthalpy from the solution of a modeled PDF transport equation, and a Lagrangian-based dilute spray model is used for the liquid-phase representation. The gas-turbine combustor flows are often characterized by a complex interaction between various physical processes associated with the interaction between the liquid and gas phases, droplet vaporization, turbulent mixing, heat release associated with chemical kinetics, radiative heat transfer associated with highly absorbing and radiating species, among others. The rate controlling processes often interact with each other at various disparate time 1 and length scales. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and liquid phase evaporation in many practical combustion devices.
Parallel robot for micro assembly with integrated innovative optical 3D-sensor
NASA Astrophysics Data System (ADS)
Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer
2002-10-01
Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.
NASA Astrophysics Data System (ADS)
Knightes, C. D.; Bouchard, D.; Zepp, R. G.; Henderson, W. M.; Han, Y.; Hsieh, H. S.; Avant, B. K.; Acrey, B.; Spear, J.
2017-12-01
The unique properties of engineered nanomaterials led to their increased production and potential release into the environment. Currently available environmental fate models developed for traditional contaminants are limited in their ability to simulate nanomaterials' environmental behavior. This is due to an incomplete understanding and representation of the processes governing nanomaterial distribution in the environment and by scarce empirical data quantifying the interaction of nanomaterials with environmental surfaces. The well-known Water Quality Analysis Simulation Program (WASP) was updated to incorporate nanomaterial-specific processes, specifically hetero-aggregation with particulate matter. In parallel with this effort, laboratory studies were used to quantify parameter values parameters necessary for governing processes in surface waters. This presentation will discuss the recent developments in the new architecture for WASP8 and the newly constructed Advanced Toxicant Module. The module includes advanced algorithms for increased numbers of state variables: chemicals, solids, dissolved organic matter, pathogens, temperature, and salinity. This presentation will focus specifically on the incorporation of nanomaterials, with the applications of the fate and transport of hypothetical releases of Multi-Walled Carbon Nanotubes (MWCNT) and Graphene Oxide (GO) into the headwaters of a southeastern US coastal plains river. While this presentation focuses on nanomaterials, the advanced toxicant module can also simulate metals and organic contaminants.
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Contingency Analysis Post-Processing With Advanced Computing and Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Glaesemann, Kurt; Fitzhenry, Erin
Contingency analysis is a critical function widely used in energy management systems to assess the impact of power system component failures. Its outputs are important for power system operation for improved situational awareness, power system planning studies, and power market operations. With the increased complexity of power system modeling and simulation caused by increased energy production and demand, the penetration of renewable energy and fast deployment of smart grid devices, and the trend of operating grids closer to their capacity for better efficiency, more and more contingencies must be executed and analyzed quickly in order to ensure grid reliability andmore » accuracy for the power market. Currently, many researchers have proposed different techniques to accelerate the computational speed of contingency analysis, but not much work has been published on how to post-process the large amount of contingency outputs quickly. This paper proposes a parallel post-processing function that can analyze contingency analysis outputs faster and display them in a web-based visualization tool to help power engineers improve their work efficiency by fast information digestion. Case studies using an ESCA-60 bus system and a WECC planning system are presented to demonstrate the functionality of the parallel post-processing technique and the web-based visualization tool.« less
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
NASA Technical Reports Server (NTRS)
Mayer, Richard J.; Blinn, Thomas M.; Dewitte, Paula S.; Crump, John W.; Ackley, Keith A.
1992-01-01
In the second volume of the Demonstration Framework Document, the graphical representation of the demonstration framework is given. This second document was created to facilitate the reading and comprehension of the demonstration framework. It is designed to be viewed in parallel with Section 4.2 of the first volume to help give a picture of the relationships between the UOB's (Unit of Behavior) of the model. The model is quite large and the design team felt that this form of presentation would make it easier for the reader to get a feel for the processes described in this document. The IDEF3 (Process Description Capture Method) diagrams of the processes of an Information System Development are presented. Volume 1 describes the processes and the agents involved with each process, while this volume graphically shows the precedence relationships among the processes.
2008-02-09
Campbell, S. Ogata, and F. Shimojo, “ Multimillion atom simulations of nanosystems on parallel computers,” in Proceedings of the International...nanomesas: multimillion -atom molecular dynamics simulations on parallel computers,” J. Appl. Phys. 94, 6762 (2003). 21. P. Vashishta, R. K. Kalia...and A. Nakano, “ Multimillion atom molecular dynamics simulations of nanoparticles on parallel computers,” Journal of Nanoparticle Research 5, 119-135
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dr. Dale M. Snider
2011-02-28
This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
Social interaction shapes babbling: Testing parallels between birdsong and speech
NASA Astrophysics Data System (ADS)
Goldstein, Michael H.; King, Andrew P.; West, Meredith J.
2003-06-01
Birdsong is considered a model of human speech development at behavioral and neural levels. Few direct tests of the proposed analogs exist, however. Here we test a mechanism of phonological development in human infants that is based on social shaping, a selective learning process first documented in songbirds. By manipulating mothers' reactions to their 8-month-old infants' vocalizations, we demonstrate that phonological features of babbling are sensitive to nonimitative social stimulation. Contingent, but not noncontingent, maternal behavior facilitates more complex and mature vocal behavior. Changes in vocalizations persist after the manipulation. The data show that human infants use social feedback, facilitating immediate transitions in vocal behavior. Social interaction creates rapid shifts to developmentally more advanced sounds. These transitions mirror the normal development of speech, supporting the predictions of the avian social shaping model. These data provide strong support for a parallel in function between vocal precursors of songbirds and infants. Because imitation is usually considered the mechanism for vocal learning in both taxa, the findings introduce social shaping as a general process underlying the development of speech and song.
Neuromimetic Circuits with Synaptic Devices Based on Strongly Correlated Electron Systems
NASA Astrophysics Data System (ADS)
Ha, Sieu D.; Shi, Jian; Meroz, Yasmine; Mahadevan, L.; Ramanathan, Shriram
2014-12-01
Strongly correlated electron systems such as the rare-earth nickelates (R NiO3 , R denotes a rare-earth element) can exhibit synapselike continuous long-term potentiation and depression when gated with ionic liquids; exploiting the extreme sensitivity of coupled charge, spin, orbital, and lattice degrees of freedom to stoichiometry. We present experimental real-time, device-level classical conditioning and unlearning using nickelate-based synaptic devices in an electronic circuit compatible with both excitatory and inhibitory neurons. We establish a physical model for the device behavior based on electric-field-driven coupled ionic-electronic diffusion that can be utilized for design of more complex systems. We use the model to simulate a variety of associate and nonassociative learning mechanisms, as well as a feedforward recurrent network for storing memory. Our circuit intuitively parallels biological neural architectures, and it can be readily generalized to other forms of cellular learning and extinction. The simulation of neural function with electronic device analogs may provide insight into biological processes such as decision making, learning, and adaptation, while facilitating advanced parallel information processing in hardware.
NASA Astrophysics Data System (ADS)
Kim, Jin Seok; Hur, Min Young; Kim, Chang Ho; Kim, Ho Jun; Lee, Hae June
2018-03-01
A two-dimensional parallelized particle-in-cell simulation has been developed to simulate a capacitively coupled plasma reactor. The parallelization using graphics processing units is applied to resolve the heavy computational load. It is found that the step-ionization plays an important role in the intermediate gas pressure of a few Torr. Without the step-ionization, the average electron density decreases while the effective electron temperature increases with the increase of gas pressure at a fixed power. With the step-ionization, however, the average electron density increases while the effective electron temperature decreases with the increase of gas pressure. The cases with the step-ionization agree well with the tendency of experimental measurement. The electron energy distribution functions show that the population of electrons having intermediate energy from 4.2 to 12 eV is relaxed by the step-ionization. Also, it was observed that the power consumption by the electrons is increasing with the increase of gas pressure by the step-ionization process, while the power consumption by the ions decreases with the increase of gas pressure.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective
NASA Technical Reports Server (NTRS)
Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)
2002-01-01
Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
pcircle - A Suite of Scalable Parallel File System Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG, FEIYI
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
The source of dual-task limitations: Serial or parallel processing of multiple response selections?
Marois, René
2014-01-01
Although it is generally recognized that the concurrent performance of two tasks incurs costs, the sources of these dual-task costs remain controversial. The serial bottleneck model suggests that serial postponement of task performance in dual-task conditions results from a central stage of response selection that can only process one task at a time. Cognitive-control models, by contrast, propose that multiple response selections can proceed in parallel, but that serial processing of task performance is predominantly adopted because its processing efficiency is higher than that of parallel processing. In the present study, we empirically tested this proposition by examining whether parallel processing would occur when it was more efficient and financially rewarded. The results indicated that even when parallel processing was more efficient and was incentivized by financial reward, participants still failed to process tasks in parallel. We conclude that central information processing is limited by a serial bottleneck. PMID:23864266
Arraying proteins by cell-free synthesis.
He, Mingyue; Wang, Ming-Wei
2007-10-01
Recent advances in life science have led to great motivation for the development of protein arrays to study functions of genome-encoded proteins. While traditional cell-based methods have been commonly used for generating protein arrays, they are usually a time-consuming process with a number of technical challenges. Cell-free protein synthesis offers an attractive system for making protein arrays, not only does it rapidly converts the genetic information into functional proteins without the need for DNA cloning, but also presents a flexible environment amenable to production of folded proteins or proteins with defined modifications. Recent advancements have made it possible to rapidly generate protein arrays from PCR DNA templates through parallel on-chip protein synthesis. This article reviews current cell-free protein array technologies and their proteomic applications.
Göpfert, Martin C; Hennig, R Matthias
2016-01-01
Insect hearing has independently evolved multiple times in the context of intraspecific communication and predator detection by transforming proprioceptive organs into ears. Research over the past decade, ranging from the biophysics of sound reception to molecular aspects of auditory transduction to the neuronal mechanisms of auditory signal processing, has greatly advanced our understanding of how insects hear. Apart from evolutionary innovations that seem unique to insect hearing, parallels between insect and vertebrate auditory systems have been uncovered, and the auditory sensory cells of insects and vertebrates turned out to be evolutionarily related. This review summarizes our current understanding of insect hearing. It also discusses recent advances in insect auditory research, which have put forward insect auditory systems for studying biological aspects that extend beyond hearing, such as cilium function, neuronal signal computation, and sensory system evolution.
Parallel Activation in Bilingual Phonological Processing
ERIC Educational Resources Information Center
Lee, Su-Yeon
2011-01-01
In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…
64 x 64 thresholding photodetector array for optical pattern recognition
NASA Astrophysics Data System (ADS)
Langenbacher, Harry; Chao, Tien-Hsin; Shaw, Timothy; Yu, Jeffrey W.
1993-10-01
A high performance 32 X 32 peak detector array is introduced. This detector consists of a 32 X 32 array of thresholding photo-transistor cells, manufactured with a standard MOSIS digital 2-micron CMOS process. A built-in thresholding function that is able to perform 1024 thresholding operations in parallel strongly distinguishes this chip from available CCD detectors. This high speed detector offers responses from one to 10 milliseconds that is much higher than the commercially available CCD detectors operating at a TV frame rate. The parallel multiple peaks thresholding detection capability makes it particularly suitable for optical correlator and optoelectronically implemented neural networks. The principle of operation, circuit design and the performance characteristics are described. Experimental demonstration of correlation peak detection is also provided. Recently, we have also designed and built an advanced version of a 64 X 64 thresholding photodetector array chip. Experimental investigation of using this chip for pattern recognition is ongoing.
State of the art in electromagnetic modeling for the Compact Linear Collider
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, Arno; Kabel, Andreas; Lee, Lie-Quan
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D electromagnetic time-domain code T3P for simulations of wakefields and transients in complex accelerator structures. T3P is based on state-of-the-art Finite Element methods on unstructured grids and features unconditional stability, quadratic surface approximation and up to 6th-order vector basis functions for unprecedented simulation accuracy. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with fast turn-around times, aiding the design of the next generation of accelerator facilities. Applications include simulations of the proposed two-beam accelerator structures for the Compact Linear Collider (CLIC) - wakefieldmore » damping in the Power Extraction and Transfer Structure (PETS) and power transfer to the main beam accelerating structures are investigated.« less
A review of GPU-based medical image reconstruction.
Després, Philippe; Jia, Xun
2017-10-01
Tomographic image reconstruction is a computationally demanding task, even more so when advanced models are used to describe a more complete and accurate picture of the image formation process. Such advanced modeling and reconstruction algorithms can lead to better images, often with less dose, but at the price of long calculation times that are hardly compatible with clinical workflows. Fortunately, reconstruction tasks can often be executed advantageously on Graphics Processing Units (GPUs), which are exploited as massively parallel computational engines. This review paper focuses on recent developments made in GPU-based medical image reconstruction, from a CT, PET, SPECT, MRI and US perspective. Strategies and approaches to get the most out of GPUs in image reconstruction are presented as well as innovative applications arising from an increased computing capacity. The future of GPU-based image reconstruction is also envisioned, based on current trends in high-performance computing. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bingham, G. E.; Pougatchev, N. S.; Zavyalov, V.; Esplin, M.; Blackwell, W. J.; Barnet, C.
2009-12-01
The NPOESS Preparatory Project is serving the operations and research community as the bridge mission between the Earth Observing System and the National Polar-orbiting Operational Environmental Satellite System. The Cross-track Infrared Sounder (CrIS), combined with the Advanced Technology Microwave Sounder (ATMS) are the core instruments to provide the key performance temperature and humidity profiles (along with some other atmospheric constituent information). Both the high spectral resolution CrIS and the upgraded microwave sounder (ATMS) will be working in parallel with already orbiting Advanced Atmospheric Infrared Sounder (AIRS/AMSU) on EOS AQUA platform and Infrared Atmospheric Sounding Interferometer (IASI/AMSU) on METOP-A satellite. This presentation will review the CrIS/ATMS capabilities in the context of continuity with the excellent performance records established by AIRS and IASI. The CrIS sensor is in the process of its final calibration and characterization testing and the results and Sensor Data Record process are being validated against this excellent dataset. The comparison between CrIS, AIRS, and IASI will include spectral, spatial, radiometric performance and sounding capability comparisons.
Quantum neuromorphic hardware for quantum artificial intelligence
NASA Astrophysics Data System (ADS)
Prati, Enrico
2017-08-01
The development of machine learning methods based on deep learning boosted the field of artificial intelligence towards unprecedented achievements and application in several fields. Such prominent results were made in parallel with the first successful demonstrations of fault tolerant hardware for quantum information processing. To which extent deep learning can take advantage of the existence of a hardware based on qubits behaving as a universal quantum computer is an open question under investigation. Here I review the convergence between the two fields towards implementation of advanced quantum algorithms, including quantum deep learning.
Relative entropy and optimization-driven coarse-graining methods in VOTCA
Mashayak, S. Y.; Jochum, Mara N.; Koschke, Konstantin; ...
2015-07-20
We discuss recent advances of the VOTCA package for systematic coarse-graining. Two methods have been implemented, namely the downhill simplex optimization and the relative entropy minimization. We illustrate the new methods by coarse-graining SPC/E bulk water and more complex water-methanol mixture systems. The CG potentials obtained from both methods are then evaluated by comparing the pair distributions from the coarse-grained to the reference atomistic simulations.We have also added a parallel analysis framework to improve the computational efficiency of the coarse-graining process.
1988 IEEE Aerospace Applications Conference, Park City, UT, Feb. 7-12, 1988, Digest
NASA Astrophysics Data System (ADS)
The conference presents papers on microwave applications, data and signal processing applications, related aerospace applications, and advanced microelectronic products for the aerospace industry. Topics include a high-performance antenna measurement system, microwave power beaming from earth to space, the digital enhancement of microwave component performance, and a GaAs vector processor based on parallel RISC microprocessors. Consideration is also given to unique techniques for reliable SBNR architectures, a linear analysis subsystem for CSSL-IV, and a structured singular value approach to missile autopilot analysis.
Multidisciplinary propulsion simulation using NPSS
NASA Technical Reports Server (NTRS)
Claus, Russell W.; Evans, Austin L.; Follen, Gregory J.
1992-01-01
The current status of the Numerical Propulsion System Simulation (NPSS) program, a cooperative effort of NASA, industry, and universities to reduce the cost and time of advanced technology propulsion system development, is reviewed. The technologies required for this program include (1) interdisciplinary analysis to couple the relevant disciplines, such as aerodynamics, structures, heat transfer, combustion, acoustics, controls, and materials; (2) integrated systems analysis; (3) a high-performance computing platform, including massively parallel processing; and (4) a simulation environment providing a user-friendly interface. Several research efforts to develop these technologies are discussed.
Cortical representations of communication sounds.
Heiser, Marc A; Cheung, Steven W
2008-10-01
This review summarizes recent research into cortical processing of vocalizations in animals and humans. There has been a resurgent interest in this topic accompanied by an increased number of studies using animal models with complex vocalizations and new methods in human brain imaging. Recent results from such studies are discussed. Experiments have begun to reveal the bilateral cortical fields involved in communication sound processing and the transformations of neural representations that occur among those fields. Advances have also been made in understanding the neuronal basis of interaction between developmental exposures and behavioral experiences with vocalization perception. Exposure to sounds during the developmental period produces large effects on brain responses, as do a variety of specific trained tasks in adults. Studies have also uncovered a neural link between the motor production of vocalizations and the representation of vocalizations in cortex. Parallel experiments in humans and animals are answering important questions about vocalization processing in the central nervous system. This dual approach promises to reveal microscopic, mesoscopic, and macroscopic principles of large-scale dynamic interactions between brain regions that underlie the complex phenomenon of vocalization perception. Such advances will yield a greater understanding of the causes, consequences, and treatment of disorders related to speech processing.
An Advanced Framework for Improving Situational Awareness in Electric Power Grid Operation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Huang, Zhenyu; Zhou, Ning
With the deployment of new smart grid technologies and the penetration of renewable energy in power systems, significant uncertainty and variability is being introduced into power grid operation. Traditionally, the Energy Management System (EMS) operates the power grid in a deterministic mode, and thus will not be sufficient for the future control center in a stochastic environment with faster dynamics. One of the main challenges is to improve situational awareness. This paper reviews the current status of power grid operation and presents a vision of improving wide-area situational awareness for a future control center. An advanced framework, consisting of parallelmore » state estimation, state prediction, parallel contingency selection, parallel contingency analysis, and advanced visual analytics, is proposed to provide capabilities needed for better decision support by utilizing high performance computing (HPC) techniques and advanced visual analytic techniques. Research results are presented to support the proposed vision and framework.« less
Doing business in space: How to get there from here
NASA Technical Reports Server (NTRS)
Wood, P. W.; Stark, P. M.
1984-01-01
A step by step process is described through which an existing enterprise or an entrepreneurial venture can initiate and carry out a new space venture. Throughout this process the business and technical aspects must be advanced in parallel with each other. Each depends on the other for its continued success, and companies may be unable to complete the venture if one or the other is neglected. The existing NASA programs and the experience of early trailblazers provide sufficient examples and opportunities for other firms to undertake new ventures with confidence. With the introduction of NASA's Commercial Space Policy, both the opportunities and the ease with which ventures can be carried out should increase significantly.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Photonic reservoir computing: a new approach to optical information processing
NASA Astrophysics Data System (ADS)
Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter
2010-06-01
Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.
The Advanced Software Development and Commercialization Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallopoulos, E.; Canfield, T.R.; Minkoff, M.
1990-09-01
This is the first of a series of reports pertaining to progress in the Advanced Software Development and Commercialization Project, a joint collaborative effort between the Center for Supercomputing Research and Development of the University of Illinois and the Computing and Telecommunications Division of Argonne National Laboratory. The purpose of this work is to apply techniques of parallel computing that were pioneered by University of Illinois researchers to mature computational fluid dynamics (CFD) and structural dynamics (SD) computer codes developed at Argonne. The collaboration in this project will bring this unique combination of expertise to bear, for the first time,more » on industrially important problems. By so doing, it will expose the strengths and weaknesses of existing techniques for parallelizing programs and will identify those problems that need to be solved in order to enable wide spread production use of parallel computers. Secondly, the increased efficiency of the CFD and SD codes themselves will enable the simulation of larger, more accurate engineering models that involve fluid and structural dynamics. In order to realize the above two goals, we are considering two production codes that have been developed at ANL and are widely used by both industry and Universities. These are COMMIX and WHAMS-3D. The first is a computational fluid dynamics code that is used for both nuclear reactor design and safety and as a design tool for the casting industry. The second is a three-dimensional structural dynamics code used in nuclear reactor safety as well as crashworthiness studies. These codes are currently available for both sequential and vector computers only. Our main goal is to port and optimize these two codes on shared memory multiprocessors. In so doing, we shall establish a process that can be followed in optimizing other sequential or vector engineering codes for parallel processors.« less
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
The new landscape of parallel computer architecture
NASA Astrophysics Data System (ADS)
Shalf, John
2007-07-01
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Parallel plan execution with self-processing networks
NASA Technical Reports Server (NTRS)
Dautrechy, C. Lynne; Reggia, James A.
1989-01-01
A critical issue for space operations is how to develop and apply advanced automation techniques to reduce the cost and complexity of working in space. In this context, it is important to examine how recent advances in self-processing networks can be applied for planning and scheduling tasks. For this reason, the feasibility of applying self-processing network models to a variety of planning and control problems relevant to spacecraft activities is being explored. Goals are to demonstrate that self-processing methods are applicable to these problems, and that MIRRORS/II, a general purpose software environment for implementing self-processing models, is sufficiently robust to support development of a wide range of application prototypes. Using MIRRORS/II and marker passing modelling techniques, a model of the execution of a Spaceworld plan was implemented. This is a simplified model of the Voyager spacecraft which photographed Jupiter, Saturn, and their satellites. It is shown that plan execution, a task usually solved using traditional artificial intelligence (AI) techniques, can be accomplished using a self-processing network. The fact that self-processing networks were applied to other space-related tasks, in addition to the one discussed here, demonstrates the general applicability of this approach to planning and control problems relevant to spacecraft activities. It is also demonstrated that MIRRORS/II is a powerful environment for the development and evaluation of self-processing systems.
MHD code using multi graphical processing units: SMAUG+
NASA Astrophysics Data System (ADS)
Gyenge, N.; Griffiths, M. K.; Erdélyi, R.
2018-01-01
This paper introduces the Sheffield Magnetohydrodynamics Algorithm Using GPUs (SMAUG+), an advanced numerical code for solving magnetohydrodynamic (MHD) problems, using multi-GPU systems. Multi-GPU systems facilitate the development of accelerated codes and enable us to investigate larger model sizes and/or more detailed computational domain resolutions. This is a significant advancement over the parent single-GPU MHD code, SMAUG (Griffiths et al., 2015). Here, we demonstrate the validity of the SMAUG + code, describe the parallelisation techniques and investigate performance benchmarks. The initial configuration of the Orszag-Tang vortex simulations are distributed among 4, 16, 64 and 100 GPUs. Furthermore, different simulation box resolutions are applied: 1000 × 1000, 2044 × 2044, 4000 × 4000 and 8000 × 8000 . We also tested the code with the Brio-Wu shock tube simulations with model size of 800 employing up to 10 GPUs. Based on the test results, we observed speed ups and slow downs, depending on the granularity and the communication overhead of certain parallel tasks. The main aim of the code development is to provide massively parallel code without the memory limitation of a single GPU. By using our code, the applied model size could be significantly increased. We demonstrate that we are able to successfully compute numerically valid and large 2D MHD problems.
Grid computing in large pharmaceutical molecular modeling.
Claus, Brian L; Johnson, Stephen R
2008-07-01
Most major pharmaceutical companies have employed grid computing to expand their compute resources with the intention of minimizing additional financial expenditure. Historically, one of the issues restricting widespread utilization of the grid resources in molecular modeling is the limited set of suitable applications amenable to coarse-grained parallelization. Recent advances in grid infrastructure technology coupled with advances in application research and redesign will enable fine-grained parallel problems, such as quantum mechanics and molecular dynamics, which were previously inaccessible to the grid environment. This will enable new science as well as increase resource flexibility to load balance and schedule existing workloads.
Ceramic Technology For Advanced Heat Engines Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1990-12-01
Significant accomplishments in fabricating ceramic components for the Department of Energy (DOE), National Aeronautics and Space Administration (NASA), and Department of Defense (DoD) advanced heat engine programs have provided evidence that the operation of ceramic parts in high-temperature engine environments is feasible. However, these programs have also demonstrated that additional research is needed in materials and processing development, design methodology, and data base and life prediction before industry will have a sufficient technology base from which to produce reliable cost-effective ceramic engine components commercially. The objective of the project is to develop the industrial technology base required for reliable ceramicsmore » for application in advanced automotive heat engines. The project approach includes determining the mechanisms controlling reliability, improving processes for fabricating existing ceramics, developing new materials with increased reliability, and testing these materials in simulated engine environments to confirm reliability. Although this is a generic materials project, the focus is on the structural ceramics for advanced gas turbine and diesel engines, ceramic bearings and attachments, and ceramic coatings for thermal barrier and wear applications in these engines. This advanced materials technology is being developed in parallel and close coordination with the ongoing DOE and industry proof of concept engine development programs. To facilitate the rapid transfer of this technology to U.S. industry, the major portion of the work is being done in the ceramic industry, with technological support from government laboratories, other industrial laboratories, and universities. Abstracts prepared for appropriate papers.« less
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Parallel workflow tools to facilitate human brain MRI post-processing
Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang
2015-01-01
Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
Cooperative storage of shared files in a parallel computing system with dynamic block size
Bent, John M.; Faibish, Sorin; Grider, Gary
2015-11-10
Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
Near Real Time Processing Chain for Suomi NPP Satellite Data
NASA Astrophysics Data System (ADS)
Monsorno, Roberto; Cuozzo, Giovanni; Costa, Armin; Mateescu, Gabriel; Ventura, Bartolomeo; Zebisch, Marc
2014-05-01
Since 2009, the EURAC satellite receiving station, located at Corno del Renon, in a free obstacle site at 2260 m a.s.l., has been acquiring data from Aqua and Terra NASA satellites equipped with Moderate Resolution Imaging Spectroradiometer (MODIS) sensors. The experience gained with this local ground segmenthas given the opportunity of adapting and modifying the processing chain for MODIS data to the Suomi NPP, the natural successor to Terra and Aqua satellites. The processing chain, initially implemented by mean of a proprietary system supplied by Seaspace and Advanced Computer System, was further developed by EURAC's Institute for Applied Remote Sensing engineers. Several algorithms have been developed using MODIS and Visible Infrared Imaging Radiometer Suite (VIIRS) data to produce Snow Cover, Particulate Matter estimation and Meteo maps. These products are implemented on a common processor structure based on the use of configuration files and a generic processor. Data and products have then automatically delivered to the customers such as the Autonomous Province of Bolzano-Civil Protection office. For the processing phase we defined two goals: i) the adaptation and implementation of the products already available for MODIS (and possibly new ones) to VIIRS, that is one of the sensors onboard Suomi NPP; ii) the use of an open source processing chain in order to process NPP data in Near Real Time, exploiting the knowledge we acquired on parallel computing. In order to achieve the second goal, the S-NPP data received and ingested are sent as input to RT-STPS (Real-time Software Telemetry Processing System) software developed by the NASA Direct Readout Laboratory 1 (DRL) that gives as output RDR files (Raw Data Record) for VIIRS, ATMS (Advanced Technology Micorwave Sounder) and CrIS (Cross-track Infrared Sounder)sensors. RDR are then transferred to a server equipped with CSPP2 (Community Satellite Processing Package) software developed by the University of Wisconsin. CSPP subdivides the input file in granules, making possible the use of parallel computing, and produces SDR (Science Data Record) and some EDR (Environmental Data Record) products. The integration with the EDRs not yet available with CSPP is realized with the use of SPAs (Science Processing Algorithm) stand-alone version by DRL. The important result of this system consists in the possibility of processing data acquired by the EURAC antenna with open source software and delivering the SDRs, EDRs and higher level products developed internally by EURAC in near real time using a Data Exchange Server. By means of the parallelized CSPP, SDR data are currently available after about 7 minutes since the production of RDR, while we are currently implementing a strategy to get the best possible processing time for the EDRs products that are in principle not parallelizable. 1. http://directreadout.sci.gsfc.nasa.gov/ 2. http://cimss.ssec.wisc.edu/cspp/
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Efficient multitasking: parallel versus serial processing of multiple tasks
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742
Efficient multitasking: parallel versus serial processing of multiple tasks.
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
Advanced Boundary Electrode Modeling for tES and Parallel tES/EEG.
Pursiainen, Sampsa; Agsten, Britte; Wagner, Sven; Wolters, Carsten H
2018-01-01
This paper explores advanced electrode modeling in the context of separate and parallel transcranial electrical stimulation (tES) and electroencephalography (EEG) measurements. We focus on boundary condition-based approaches that do not necessitate adding auxiliary elements, e.g., sponges, to the computational domain. In particular, we investigate the complete electrode model (CEM) which incorporates a detailed description of the skin-electrode interface including its contact surface, impedance, and normal current distribution. The CEM can be applied for both tES and EEG electrodes which are advantageous when a parallel system is used. In comparison to the CEM, we test two important reduced approaches: the gap model (GAP) and the point electrode model (PEM). We aim to find out the differences of these approaches for a realistic numerical setting based on the stimulation of the auditory cortex. The results obtained suggest, among other things, that GAP and GAP/PEM are sufficiently accurate for the practical application of tES and parallel tES/EEG, respectively. Differences between CEM and GAP were observed mainly in the skin compartment, where only CEM explains the heating effects characteristic to tES.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
The Snow Data System at NASA JPL
NASA Astrophysics Data System (ADS)
Laidlaw, R.; Painter, T. H.; Mattmann, C. A.; Ramirez, P.; Brodzik, M. J.; Rittger, K.; Bormann, K. J.; Burgess, A. B.; Zimdars, P.; McGibbney, L. J.; Goodale, C. E.; Joyce, M.
2015-12-01
The Snow Data System at NASA JPL includes a data processing pipeline built with open source software, Apache 'Object Oriented Data Technology' (OODT). It produces a variety of data products using inputs from satellites such as MODIS, VIIRS and Landsat. Processing is carried out in parallel across a high-powered computing cluster. Algorithms such as 'Snow Covered Area and Grain-size' (SCAG) and 'Dust Radiative Forcing in Snow' (DRFS) are applied to satellite inputs to produce output images that are used by many scientists and institutions around the world. This poster will describe the Snow Data System, its outputs and their uses and applications, along with recent advancements to the system and plans for the future. Advancements for 2015 include automated daily processing of historic MODIS data for SCAG (MODSCAG) and DRFS (MODDRFS), automation of SCAG processing for VIIRS satellite inputs (VIIRSCAG) and an updated version of SCAG for Landsat Thematic Mapper inputs (TMSCAG) that takes advantage of Graphics Processing Units (GPUs) for faster processing speeds. The pipeline has been upgraded to use the latest version of OODT and its workflows have been streamlined to enable computer operators to process data on demand. Additional products have been added, such as rolling 8-day composites of MODSCAG data, a new version of the MODSCAG 'annual minimum ice and snow extent' (MODICE) product, and recoded MODSCAG data for the 'Satellite Snow Product Intercomparison and Evaluation Experiment' (SnowPEx) project.
NASA Astrophysics Data System (ADS)
Herrera, I.; Herrera, G. S.
2015-12-01
Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
ASC-ATDM Performance Portability Requirements for 2015-2019
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Harold C.; Trott, Christian Robert
This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less
Advanced-to-Revolutionary Space Technology Options - The Responsibly Imaginable
NASA Technical Reports Server (NTRS)
Bushnell, Dennis M.
2013-01-01
Paper summarizes a spectrum of low TRL, high risk technologies and systems approaches which could massively change the cost and safety of space exploration/exploitation/industrialization. These technologies and approaches could be studied in a triage fashion, the method of evaluation wherein several prospective solutions are investigated in parallel to address the innate risk of each, with resources concentrated on the more successful as more is learned. Technology areas addressed include Fabrication, Materials, Energetics, Communications, Propulsion, Radiation Protection, ISRU and LEO access. Overall and conceptually it should be possible with serious research to enable human space exploration beyond LEO both safe and affordable with a design process having sizable positive margins. Revolutionary goals require, generally, revolutionary technologies. By far, Revolutionary Energetics is the most important, has the most leverage, of any advanced technology for space exploration applications.
NASA Technical Reports Server (NTRS)
Clune, Tom
2014-01-01
This tutorial will introduce Fortran developers to unit-testing and test-driven development (TDD) using pFUnit. As with other unit-testing frameworks, pFUnit, simplifies the process of writing, collecting, and executing tests while providing clear diagnostic messages for failing tests. pFUnit specifically targets the development of scientific-technical software written in Fortran and includes customized features such as: assertions for multi-dimensional arrays, distributed (MPI) and thread-based (OpenMP) parallellism, and flexible parameterized tests.These sessions will include numerous examples and hands-on exercises that gradually build in complexity. Attendees are expected to have working knowledge of F90, but familiarity with object-oriented syntax in F2003 and MPI will be of benefit for the more advanced examples. By the end of the tutorial the audience should feel comfortable in applying pFUnit within their own development environment.
Positron annihilation in transparent ceramics
NASA Astrophysics Data System (ADS)
Husband, P.; Bartošová, I.; Slugeň, V.; Selim, F. A.
2016-01-01
Transparent ceramics are emerging as excellent candidates for many photonic applications including laser, scintillation and illumination. However achieving perfect transparency is essential in these applications and requires high technology processing and complete understanding for the ceramic microstructure and its effect on the optical properties. Positron annihilation spectroscopy (PAS) is the perfect tool to study porosity and defects. It has been applied to investigate many ceramic structures; and transparent ceramics field may be greatly advanced by applying PAS. In this work positron lifetime (PLT) measurements were carried out in parallel with optical studies on yttrium aluminum garnet transparent ceramics in order to gain an understanding for their structure at the atomic level and its effect on the transparency and light scattering. The study confirmed that PAS can provide useful information on their microstructure and guide the technology of manufacturing and advancing transparent ceramics.
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Experiences with Bilateral Art: A Retrospective Study
ERIC Educational Resources Information Center
McNamee, Carole M.
2006-01-01
Recent advances in neuroscience describe the effect of experience on neural architecture. Paralleling these advances in neuroscience, recent explorations in the field of art therapy speculate on the relationship between specific therapeutic interventions and neuroplasticity, which underlies the changes in neural architecture. One such…
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
NASA Astrophysics Data System (ADS)
Wang, Liping; Jiang, Yao; Li, Tiemin
2014-09-01
Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
NASA Technical Reports Server (NTRS)
Kerr, Andrew W.
1990-01-01
The utilization of advanced simulation technology in the development of the non-real-time MANPRINT design tools in the Army/NASA Aircrew-Aircraft Integration (A3I) program is described. A description is then given of the Crew Station Research and Development Facilities, the primary tool for the application of MANPRINT principles. The purpose of the A3I program is to develop a rational, predictive methodology for helicopter cockpit system design that integrates human factors engineering with other principles at an early stage in the development process, avoiding the high cost of previous system design methods. Enabling technologies such as the MIDAS work station are examined, and the potential of low-cost parallel-processing systems is indicated.
The Snow Data System at NASA JPL
NASA Astrophysics Data System (ADS)
Horn, J.; Painter, T. H.; Bormann, K. J.; Rittger, K.; Brodzik, M. J.; Skiles, M.; Burgess, A. B.; Mattmann, C. A.; Ramirez, P.; Joyce, M.; Goodale, C. E.; McGibbney, L. J.; Zimdars, P.; Yaghoobi, R.
2017-12-01
The Snow Data System at NASA JPL includes data processing pipelines built with open source software, Apache 'Object Oriented Data Technology' (OODT). Processing is carried out in parallel across a high-powered computing cluster. The pipelines use input data from satellites such as MODIS, VIIRS and Landsat. They apply algorithms to the input data to produce a variety of outputs in GeoTIFF format. These outputs include daily data for SCAG (Snow Cover And Grain size) and DRFS (Dust Radiative Forcing in Snow), along with 8-day composites and MODICE annual minimum snow and ice calculations. This poster will describe the Snow Data System, its outputs and their uses and applications. It will also highlight recent advancements to the system and plans for the future.
The Snow Data System at NASA JPL
NASA Astrophysics Data System (ADS)
Joyce, M.; Laidlaw, R.; Painter, T. H.; Bormann, K. J.; Rittger, K.; Brodzik, M. J.; Skiles, M.; Burgess, A. B.; Mattmann, C. A.; Ramirez, P.; Goodale, C. E.; McGibbney, L. J.; Zimdars, P.; Yaghoobi, R.
2016-12-01
The Snow Data System at NASA JPL includes data processing pipelines built with open source software, Apache 'Object Oriented Data Technology' (OODT). Processing is carried out in parallel across a high-powered computing cluster. The pipelines use input data from satellites such as MODIS, VIIRS and Landsat. They apply algorithms to the input data to produce a variety of outputs in GeoTIFF format. These outputs include daily data for SCAG (Snow Cover And Grain size) and DRFS (Dust Radiative Forcing in Snow), along with 8-day composites and MODICE annual minimum snow and ice calculations. This poster will describe the Snow Data System, its outputs and their uses and applications. It will also highlight recent advancements to the system and plans for the future.
“Scar-cinoma”: viewing the fibrotic lung mesenchymal cell in the context of cancer biology
Horowitz, Jeffrey C.; Osterholzer, John J.; Marazioti, Antonia; Stathopoulos, Georgios T.
2017-01-01
Lung cancer and pulmonary fibrosis are common, yet distinct, pathological processes that represent urgent unmet medical needs. Striking clinical and mechanistic parallels exist between these distinct disease entities. The goal of this article is to examine lung fibrosis from the perspective of cancer-associated phenotypic hallmarks, to discuss areas of mechanistic overlap and distinction, and to highlight profibrotic mechanisms that contribute to carcinogenesis. Ultimately, we speculate that such comparisons might identify opportunities to leverage our current understanding of the pathobiology of each disease process in order to advance novel therapeutic approaches for both. We anticipate that such “outside the box” concepts could be translated to a more precise and individualised approach to fibrotic diseases of the lung. PMID:27030681
Pulse-coupled neural network implementation in FPGA
NASA Astrophysics Data System (ADS)
Waldemark, Joakim T. A.; Lindblad, Thomas; Lindsey, Clark S.; Waldemark, Karina E.; Oberg, Johnny; Millberg, Mikael
1998-03-01
Pulse Coupled Neural Networks (PCNN) are biologically inspired neural networks, mainly based on studies of the visual cortex of small mammals. The PCNN is very well suited as a pre- processor for image processing, particularly in connection with object isolation, edge detection and segmentation. Several implementations of PCNN on von Neumann computers, as well as on special parallel processing hardware devices (e.g. SIMD), exist. However, these implementations are not as flexible as required for many applications. Here we present an implementation in Field Programmable Gate Arrays (FPGA) together with a performance analysis. The FPGA hardware implementation may be considered a platform for further, extended implementations and easily expanded into various applications. The latter may include advanced on-line image analysis with close to real-time performance.
Parallel ICA and its hardware implementation in hyperspectral image analysis
NASA Astrophysics Data System (ADS)
Du, Hongtao; Qi, Hairong; Peterson, Gregory D.
2004-04-01
Advances in hyperspectral images have dramatically boosted remote sensing applications by providing abundant information using hundreds of contiguous spectral bands. However, the high volume of information also results in excessive computation burden. Since most materials have specific characteristics only at certain bands, a lot of these information is redundant. This property of hyperspectral images has motivated many researchers to study various dimensionality reduction algorithms, including Projection Pursuit (PP), Principal Component Analysis (PCA), wavelet transform, and Independent Component Analysis (ICA), where ICA is one of the most popular techniques. It searches for a linear or nonlinear transformation which minimizes the statistical dependence between spectral bands. Through this process, ICA can eliminate superfluous but retain practical information given only the observations of hyperspectral images. One hurdle of applying ICA in hyperspectral image (HSI) analysis, however, is its long computation time, especially for high volume hyperspectral data sets. Even the most efficient method, FastICA, is a very time-consuming process. In this paper, we present a parallel ICA (pICA) algorithm derived from FastICA. During the unmixing process, pICA divides the estimation of weight matrix into sub-processes which can be conducted in parallel on multiple processors. The decorrelation process is decomposed into the internal decorrelation and the external decorrelation, which perform weight vector decorrelations within individual processors and between cooperative processors, respectively. In order to further improve the performance of pICA, we seek hardware solutions in the implementation of pICA. Until now, there are very few hardware designs for ICA-related processes due to the complicated and iterant computation. This paper discusses capacity limitation of FPGA implementations for pICA in HSI analysis. A synthesis of Application-Specific Integrated Circuit (ASIC) is designed for pICA-based dimensionality reduction in HSI analysis. The pICA design is implemented using standard-height cells and aimed at TSMC 0.18 micron process. During the synthesis procedure, three ICA-related reconfigurable components are developed for the reuse and retargeting purpose. Preliminary results show that the standard-height cell based ASIC synthesis provide an effective solution for pICA and ICA-related processes in HSI analysis.
A high-speed linear algebra library with automatic parallelism
NASA Technical Reports Server (NTRS)
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers
NASA Technical Reports Server (NTRS)
Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.
2014-01-01
This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Effects of ATC automation on precision approaches to closely space parallel runways
NASA Technical Reports Server (NTRS)
Slattery, R.; Lee, K.; Sanford, B.
1995-01-01
Improved navigational technology (such as the Microwave Landing System and the Global Positioning System) installed in modern aircraft will enable air traffic controllers to better utilize available airspace. Consequently, arrival traffic can fly approaches to parallel runways separated by smaller distances than are currently allowed. Previous simulation studies of advanced navigation approaches have found that controller workload is increased when there is a combination of aircraft that are capable of following advanced navigation routes and aircraft that are not. Research into Air Traffic Control automation at Ames Research Center has led to the development of the Center-TRACON Automation System (CTAS). The Final Approach Spacing Tool (FAST) is the component of the CTAS used in the TRACON area. The work in this paper examines, via simulation, the effects of FAST used for aircraft landing on closely spaced parallel runways. The simulation contained various combinations of aircraft, equipped and unequipped with advanced navigation systems. A set of simulations was run both manually and with an augmented set of FAST advisories to sequence aircraft, assign runways, and avoid conflicts. The results of the simulations are analyzed, measuring the airport throughput, aircraft delay, loss of separation, and controller workload.
A distributed pipeline for DIDSON data processing
Li, Liling; Danner, Tyler; Eickholt, Jesse; McCann, Erin L.; Pangle, Kevin; Johnson, Nicholas
2018-01-01
Technological advances in the field of ecology allow data on ecological systems to be collected at high resolution, both temporally and spatially. Devices such as Dual-frequency Identification Sonar (DIDSON) can be deployed in aquatic environments for extended periods and easily generate several terabytes of underwater surveillance data which may need to be processed multiple times. Due to the large amount of data generated and need for flexibility in processing, a distributed pipeline was constructed for DIDSON data making use of the Hadoop ecosystem. The pipeline is capable of ingesting raw DIDSON data, transforming the acoustic data to images, filtering the images, detecting and extracting motion, and generating feature data for machine learning and classification. All of the tasks in the pipeline can be run in parallel and the framework allows for custom processing. Applications of the pipeline include monitoring migration times, determining the presence of a particular species, estimating population size and other fishery management tasks.
Xi-cam: Flexible High Throughput Data Processing for GISAXS
NASA Astrophysics Data System (ADS)
Pandolfi, Ronald; Kumar, Dinesh; Venkatakrishnan, Singanallur; Sarje, Abinav; Krishnan, Hari; Pellouchoud, Lenson; Ren, Fang; Fournier, Amanda; Jiang, Zhang; Tassone, Christopher; Mehta, Apurva; Sethian, James; Hexemer, Alexander
With increasing capabilities and data demand for GISAXS beamlines, supporting software is under development to handle larger data rates, volumes, and processing needs. We aim to provide a flexible and extensible approach to GISAXS data treatment as a solution to these rising needs. Xi-cam is the CAMERA platform for data management, analysis, and visualization. The core of Xi-cam is an extensible plugin-based GUI platform which provides users an interactive interface to processing algorithms. Plugins are available for SAXS/GISAXS data and data series visualization, as well as forward modeling and simulation through HipGISAXS. With Xi-cam's advanced mode, data processing steps are designed as a graph-based workflow, which can be executed locally or remotely. Remote execution utilizes HPC or de-localized resources, allowing for effective reduction of high-throughput data. Xi-cam is open-source and cross-platform. The processing algorithms in Xi-cam include parallel cpu and gpu processing optimizations, also taking advantage of external processing packages such as pyFAI. Xi-cam is available for download online.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Planning and Resource Management in an Intelligent Automated Power Management System
NASA Technical Reports Server (NTRS)
Morris, Robert A.
1991-01-01
Power system management is a process of guiding a power system towards the objective of continuous supply of electrical power to a set of loads. Spacecraft power system management requires planning and scheduling, since electrical power is a scarce resource in space. The automation of power system management for future spacecraft has been recognized as an important R&D goal. Several automation technologies have emerged including the use of expert systems for automating human problem solving capabilities such as rule based expert system for fault diagnosis and load scheduling. It is questionable whether current generation expert system technology is applicable for power system management in space. The objective of the ADEPTS (ADvanced Electrical Power management Techniques for Space systems) is to study new techniques for power management automation. These techniques involve integrating current expert system technology with that of parallel and distributed computing, as well as a distributed, object-oriented approach to software design. The focus of the current study is the integration of new procedures for automatically planning and scheduling loads with procedures for performing fault diagnosis and control. The objective is the concurrent execution of both sets of tasks on separate transputer processors, thus adding parallelism to the overall management process.
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Evaluation of somatosensory cortical differences between flutter and vibration tactile stimuli.
Han, Sang Woo; Chung, Yoon Gi; Kim, Hyung-Sik; Chung, Soon-Cheol; Park, Jang-Yeon; Kim, Sung-Phil
2013-01-01
In parallel with advances in haptic-based mobile computing systems, understanding of the neural processing of vibrotactile information becomes of great importance. In the human nervous system, two types of vibrotactile information, flutter and vibration, are delivered from mechanoreceptors to the somatosensory cortex through segregated neural afferents. To investigate how the somatosensory cortex differentiates flutter and vibration, we analyzed the cortical responses to vibrotactile stimuli with a wide range of frequencies. Specifically, we examined whether cortical activity changed most around 50 Hz, which is known as a boundary between flutter and vibration. We explored various measures to evaluate separability of cortical activity across frequency and found that the hypothesis margin method resulted in the greatest separability between flutter and vibration. This result suggests that flutter and vibration information may be processed by different neural processes in the somatosensory cortex.
Cardiac imaging: working towards fully-automated machine analysis & interpretation.
Slomka, Piotr J; Dey, Damini; Sitek, Arkadiusz; Motwani, Manish; Berman, Daniel S; Germano, Guido
2017-03-01
Non-invasive imaging plays a critical role in managing patients with cardiovascular disease. Although subjective visual interpretation remains the clinical mainstay, quantitative analysis facilitates objective, evidence-based management, and advances in clinical research. This has driven developments in computing and software tools aimed at achieving fully automated image processing and quantitative analysis. In parallel, machine learning techniques have been used to rapidly integrate large amounts of clinical and quantitative imaging data to provide highly personalized individual patient-based conclusions. Areas covered: This review summarizes recent advances in automated quantitative imaging in cardiology and describes the latest techniques which incorporate machine learning principles. The review focuses on the cardiac imaging techniques which are in wide clinical use. It also discusses key issues and obstacles for these tools to become utilized in mainstream clinical practice. Expert commentary: Fully-automated processing and high-level computer interpretation of cardiac imaging are becoming a reality. Application of machine learning to the vast amounts of quantitative data generated per scan and integration with clinical data also facilitates a move to more patient-specific interpretation. These developments are unlikely to replace interpreting physicians but will provide them with highly accurate tools to detect disease, risk-stratify, and optimize patient-specific treatment. However, with each technological advance, we move further from human dependence and closer to fully-automated machine interpretation.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Grider, Gary A.; Poole, Stephen W.
2015-09-01
Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
schwimmbad: A uniform interface to parallel processing pools in Python
NASA Astrophysics Data System (ADS)
Price-Whelan, Adrian M.; Foreman-Mackey, Daniel
2017-09-01
Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
Common hyperspectral image database design
NASA Astrophysics Data System (ADS)
Tian, Lixun; Liao, Ningfang; Chai, Ali
2009-11-01
This paper is to introduce Common hyperspectral image database with a demand-oriented Database design method (CHIDB), which comprehensively set ground-based spectra, standardized hyperspectral cube, spectral analysis together to meet some applications. The paper presents an integrated approach to retrieving spectral and spatial patterns from remotely sensed imagery using state-of-the-art data mining and advanced database technologies, some data mining ideas and functions were associated into CHIDB to make it more suitable to serve in agriculture, geological and environmental areas. A broad range of data from multiple regions of the electromagnetic spectrum is supported, including ultraviolet, visible, near-infrared, thermal infrared, and fluorescence. CHIDB is based on dotnet framework and designed by MVC architecture including five main functional modules: Data importer/exporter, Image/spectrum Viewer, Data Processor, Parameter Extractor, and On-line Analyzer. The original data were all stored in SQL server2008 for efficient search, query and update, and some advance Spectral image data Processing technology are used such as Parallel processing in C#; Finally an application case is presented in agricultural disease detecting area.
Collignon, Bertrand; Séguret, Axel; Halloy, José
2016-01-01
Collective motion is one of the most ubiquitous behaviours displayed by social organisms and has led to the development of numerous models. Recent advances in the understanding of sensory system and information processing by animals impels one to revise classical assumptions made in decisional algorithms. In this context, we present a model describing the three-dimensional visual sensory system of fish that adjust their trajectory according to their perception field. Furthermore, we introduce a stochastic process based on a probability distribution function to move in targeted directions rather than on a summation of influential vectors as is classically assumed by most models. In parallel, we present experimental results of zebrafish (alone or in group of 10) swimming in both homogeneous and heterogeneous environments. We use these experimental data to set the parameter values of our model and show that this perception-based approach can simulate the collective motion of species showing cohesive behaviour in heterogeneous environments. Finally, we discuss the advances of this multilayer model and its possible outcomes in biological, physical and robotic sciences. PMID:26909173
Boleda, M A Rosa; Galceran, M A Teresa; Ventura, Francesc
2011-06-01
The behavior along the potabilization process of 29 pharmaceuticals and 12 drugs of abuse identified from a total of 81 compounds at the intake of a drinking water treatment plant (DWTP) has been studied. The DWTP has a common treatment consisting of dioxychlorination, coagulation/flocculation and sand filtration and then water is splitted in two parallel treatment lines: conventional (ozonation and carbon filtration) and advanced (ultrafiltration and reverse osmosis) to be further blended, chlorinated and distributed. Full removals were reached for most of the compounds. Iopromide (up to 17.2 ng/L), nicotine (13.7 ng/L), benzoylecgonine (1.9 ng/L), cotinine (3.6 ng/L), acetaminophen (15.6 ng/L), erythromycin (2.0 ng/L) and caffeine (6.0 ng/L) with elimination efficiencies ≥ 94%, were the sole compounds found in the treated water. The advanced treatment process showed a slightly better efficiency than the conventional treatment to eliminate pharmaceuticals and drugs of abuse. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Read, S J; Vanman, E J; Miller, L C
1997-01-01
We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
A review of digital microfluidics as portable platforms for lab-on a-chip applications.
Samiei, Ehsan; Tabrizian, Maryam; Hoorfar, Mina
2016-07-07
Following the development of microfluidic systems, there has been a high tendency towards developing lab-on-a-chip devices for biochemical applications. A great deal of effort has been devoted to improve and advance these devices with the goal of performing complete sets of biochemical assays on the device and possibly developing portable platforms for point of care applications. Among the different microfluidic systems used for such a purpose, digital microfluidics (DMF) shows high flexibility and capability of performing multiplex and parallel biochemical operations, and hence, has been considered as a suitable candidate for lab-on-a-chip applications. In this review, we discuss the most recent advances in the DMF platforms, and evaluate the feasibility of developing multifunctional packages for performing complete sets of processes of biochemical assays, particularly for point-of-care applications. The progress in the development of DMF systems is reviewed from eight different aspects, including device fabrication, basic fluidic operations, automation, manipulation of biological samples, advanced operations, detection, biological applications, and finally, packaging and portability of the DMF devices. Success in developing the lab-on-a-chip DMF devices will be concluded based on the advances achieved in each of these aspects.
Using Parallel Processing for Problem Solving.
1979-12-01
are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language
Implementing and analyzing the multi-threaded LP-inference
NASA Astrophysics Data System (ADS)
Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.
2018-03-01
The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.
Parallel aeroelastic computations for wing and wing-body configurations
NASA Technical Reports Server (NTRS)
Byun, Chansup
1994-01-01
The objective of this research is to develop computationally efficient methods for solving fluid-structural interaction problems by directly coupling finite difference Euler/Navier-Stokes equations for fluids and finite element dynamics equations for structures on parallel computers. This capability will significantly impact many aerospace projects of national importance such as Advanced Subsonic Civil Transport (ASCT), where the structural stability margin becomes very critical at the transonic region. This research effort will have direct impact on the High Performance Computing and Communication (HPCC) Program of NASA in the area of parallel computing.
Guo, Fei; Li, Ning; Fecher, Frank W.; Gasparini, Nicola; Quiroz, Cesar Omar Ramirez; Bronnbauer, Carina; Hou, Yi; Radmilović, Vuk V.; Radmilović, Velimir R.; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J.
2015-01-01
The multi-junction concept is the most relevant approach to overcome the Shockley–Queisser limit for single-junction photovoltaic cells. The record efficiencies of several types of solar technologies are held by series-connected tandem configurations. However, the stringent current-matching criterion presents primarily a material challenge and permanently requires developing and processing novel semiconductors with desired bandgaps and thicknesses. Here we report a generic concept to alleviate this limitation. By integrating series- and parallel-interconnections into a triple-junction configuration, we find significantly relaxed material selection and current-matching constraints. To illustrate the versatile applicability of the proposed triple-junction concept, organic and organic-inorganic hybrid triple-junction solar cells are constructed by printing methods. High fill factors up to 68% without resistive losses are achieved for both organic and hybrid triple-junction devices. Series/parallel triple-junction cells with organic, as well as perovskite-based subcells may become a key technology to further advance the efficiency roadmap of the existing photovoltaic technologies. PMID:26177808
Guo, Fei; Li, Ning; Fecher, Frank W; Gasparini, Nicola; Ramirez Quiroz, Cesar Omar; Bronnbauer, Carina; Hou, Yi; Radmilović, Vuk V; Radmilović, Velimir R; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J
2015-07-16
The multi-junction concept is the most relevant approach to overcome the Shockley-Queisser limit for single-junction photovoltaic cells. The record efficiencies of several types of solar technologies are held by series-connected tandem configurations. However, the stringent current-matching criterion presents primarily a material challenge and permanently requires developing and processing novel semiconductors with desired bandgaps and thicknesses. Here we report a generic concept to alleviate this limitation. By integrating series- and parallel-interconnections into a triple-junction configuration, we find significantly relaxed material selection and current-matching constraints. To illustrate the versatile applicability of the proposed triple-junction concept, organic and organic-inorganic hybrid triple-junction solar cells are constructed by printing methods. High fill factors up to 68% without resistive losses are achieved for both organic and hybrid triple-junction devices. Series/parallel triple-junction cells with organic, as well as perovskite-based subcells may become a key technology to further advance the efficiency roadmap of the existing photovoltaic technologies.
Biosynthesis and genetic encoding of phosphothreonine through parallel selection and deep sequencing
Huguenin-Dezot, Nicolas; Liang, Alexandria D.; Schmied, Wolfgang H.; Rogerson, Daniel T.; Chin, Jason W.
2017-01-01
The phosphorylation of threonine residues in proteins regulates diverse processes in eukaryotic cells, and thousands of threonine phosphorylations have been identified. An understanding of how threonine phosphorylation regulates biological function will be accelerated by general methods to bio-synthesize defined phospho-proteins. Here we address limitations in current methods for discovering aminoacyl-tRNA synthetase/tRNA pairs for incorporating non-natural amino acids into proteins, by combining parallel positive selections with deep sequencing and statistical analysis, to create a rapid approach for directly discovering aminoacyl-tRNA synthetase/tRNA pairs that selectively incorporate non-natural substrates. Our approach is scalable and enables the direct discovery of aminoacyl-tRNA synthetase/tRNA pairs with mutually orthogonal substrate specificity. We biosynthesize phosphothreonine in cells, and use our new selection approach to discover a phosphothreonyl-tRNA synthetase/tRNACUA pair. By combining these advances we create an entirely biosynthetic route to incorporating phosphothreonine in proteins and biosynthesize several phosphoproteins; enabling phosphoprotein structure determination and synthetic protein kinase activation. PMID:28553966
Lattice Boltzmann computation of creeping fluid flow in roll-coating applications
NASA Astrophysics Data System (ADS)
Rajan, Isac; Kesana, Balashanker; Perumal, D. Arumuga
2018-04-01
Lattice Boltzmann Method (LBM) has advanced as a class of Computational Fluid Dynamics (CFD) methods used to solve complex fluid systems and heat transfer problems. It has ever-increasingly attracted the interest of researchers in computational physics to solve challenging problems of industrial and academic importance. In this current study, LBM is applied to simulate the creeping fluid flow phenomena commonly encountered in manufacturing technologies. In particular, we apply this novel method to simulate the fluid flow phenomena associated with the "meniscus roll coating" application. This prevalent industrial problem encountered in polymer processing and thin film coating applications is modelled as standard lid-driven cavity problem to which creeping flow analysis is applied. This incompressible viscous flow problem is studied in various speed ratios, the ratio of upper to lower lid speed in two different configurations of lid movement - parallel and anti-parallel wall motion. The flow exhibits interesting patterns which will help in design of roll coaters.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Distributed and parallel Ada and the Ada 9X recommendations
NASA Technical Reports Server (NTRS)
Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.
1992-01-01
Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Wen, X.; Datta, A.; Traverso, L. M.; Pan, L.; Xu, X.; Moon, E. E.
2015-01-01
Optical lithography, the enabling process for defining features, has been widely used in semiconductor industry and many other nanotechnology applications. Advances of nanotechnology require developments of high-throughput optical lithography capabilities to overcome the optical diffraction limit and meet the ever-decreasing device dimensions. We report our recent experimental advancements to scale up diffraction unlimited optical lithography in a massive scale using the near field nanolithography capabilities of bowtie apertures. A record number of near-field optical elements, an array of 1,024 bowtie antenna apertures, are simultaneously employed to generate a large number of patterns by carefully controlling their working distances over the entire array using an optical gap metrology system. Our experimental results reiterated the ability of using massively-parallel near-field devices to achieve high-throughput optical nanolithography, which can be promising for many important nanotechnology applications such as computation, data storage, communication, and energy. PMID:26525906
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer
1997-01-01
A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Picosecond UV single photon detectors with lateral drift field: Concept and technologies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yakimov, M.; Oktyabrsky, S.; Murat, P.
2015-09-01
Group III–V semiconductor materials are being considered as a Si replacement for advanced logic devices for quite some time. Advances in III–V processing technologies, such as interface and surface passivation, large area deep submicron lithography with high-aspect ratio etching primarily driven by the metal-oxide-semiconductor field-effect transistor development can also be used for other applications. In this paper we will focus on photodetectors with the drift field parallel to the surface. We compare the proposed concept to the state-of-the-art Si-based technology and discuss requirements which need to be satisfied for such detectors to be used in a single photon counting modemore » in blue and ultraviolet spectral region with about 10 ps photon timing resolution essential for numerous applications ranging from high-energy physics to medical imaging.« less
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
O'Donnell, Michael
2015-01-01
State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo
2012-02-01
We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Giannakis, Stefanos; Jovic, Milica; Gasilova, Natalia; Pastor Gelabert, Miquel; Schindelholz, Simon; Furbringer, Jean-Marie; Girault, Hubert; Pulgarin, César
2017-06-15
In this work, an Iodinated Contrast Medium (ICM), Iohexol, was subjected to treatment by 3 Advanced Oxidation Processes (AOPs) (UV, UV/H 2 O 2 , UV/H 2 O 2 /Fe 2+ ). Water, wastewater and urine were spiked with Iohexol, in order to investigate the treatment efficiency of AOPs. A tri-level approach has been deployed to assess the UV-based AOPs efficacy. The treatment was heavily influenced by the UV transmittance and the organics content of the matrix, as dilution and acidification improved the degradation but iron/H 2 O 2 increase only moderately. Furthermore, optimization of the treatment conditions, as well as modeling of the degradation was performed, by step-wise constructed quadratic or product models, and determination of the optimal operational regions was achieved through desirability functions. Finally, global chemical parameters (COD, TOC and UV-Vis absorbance) were followed in parallel with specific analyses to elucidate the degradation process of Iohexol by UV-based AOPs. Through HPLC/MS analysis the degradation pathway and the effects the operational parameters were monitored, thus attributing the pathways the respective modifications. The addition of iron in the UV/H 2 O 2 process inflicted additional pathways beneficial for both Iohexol and organics removal from the matrix. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Collins, Timothy J.; Congdon, William M.; Smeltzer, Stanley S.; Whitley, Karen S.
2005-01-01
The next generation of planetary exploration vehicles will rely heavily on robust aero-assist technologies, especially those that include aerocapture. This paper provides an overview of an ongoing development program, led by NASA Langley Research Center (LaRC) and aimed at introducing high-temperature structures, adhesives, and advanced thermal protection system (TPS) materials into the aeroshell design process. The purpose of this work is to demonstrate TPS materials that can withstand the higher heating rates of NASA's next generation planetary missions, and to validate high-temperature structures and adhesives that can reduce required TPS thickness and total aeroshell mass, thus allowing for larger science payloads. The effort described consists of parallel work in several advanced aeroshell technology areas. The areas of work include high-temperature adhesives, high-temperature composite materials, advanced ablator (TPS) materials, sub-scale demonstration test articles, and aeroshell modeling and analysis. The status of screening test results for a broad selection of available higher-temperature adhesives is presented. It appears that at least one (and perhaps a few) adhesives have working temperatures ranging from 315-400 C (600-750 F), and are suitable for TPS-to-structure bondline temperatures that are significantly above the traditional allowable of 250 C (482 F). The status of mechanical testing of advanced high-temperature composite materials is also summarized. To date, these tests indicate the potential for good material performance at temperatures of at least 600 F. Application of these materials and adhesives to aeroshell systems that incorporate advanced TPS materials may reduce aeroshell TPS mass by 15% - 30%. A brief outline is given of work scheduled for completion in 2006 that will include fabrication and testing of large panels and subscale aeroshell test articles at the Solar-Tower Test Facility located at Kirtland AFB and operated by Sandia National Laboratories. These tests are designed to validate aeroshell manufacturability using advanced material systems, and to demonstrate the maintenance of bondline integrity at realistically high temperatures and heating rates. Finally, a status is given of ongoing aeroshell modeling and analysis efforts which will be used to correlate with experimental testing, and to provide a reliable means of extrapolating to performance under actual flight conditions. The modeling and analysis effort includes a parallel series of experimental tests to determine TSP thermal expansion and other mechanical properties which are required for input to the analysis models.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Etchepareborde, S; Mills, J; Busoni, V; Brunel, L; Balligand, M
2011-01-01
To calculate the difference between the desired tibial tuberosity advancement (TTA) along the tibial plateau axis and the advancement truly achieved in that direction when cage size has been determined using the method of Montavon and colleagues. To measure the effect of this difference on the final patellar tendon-tibial plateau angle (PTA) in relation to the ideal 90°. Trigonometry was used to calculate the theoretical actual advancement of the tibial tuberosity in a direction parallel to the tibial plateau that would be achieved by the placement of a cage at the level of the tibial tuberosity in the osteotomy plane of the tibial crest. The same principle was used to calculate the size of the cage that would have been required to achieve the desired advancement. The effect of the difference between the desired advancement and the actual advancement achieved on the final PTA was calculated. For a given desired advancement, the greater the tibial plateau angle (TPA), the greater the difference between the desired advancement and the actual advancement achieved. The maximum discrepancy calculated was 5.8 mm for a 12 mm advancement in a case of extreme TPA (59°). When the TPA was less than 31°, the PTA was in the range of 90° to 95°. A discrepancy does exist between the desired tibial tuberosity advancement and the actual advancement in a direction parallel to the TPA, when the tibial tuberosity is not translated proximally. Although this has an influence on the final PTA, further studies are warranted to evaluate whether this is clinically significant.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2014-02-28
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2015-01-01
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
NASA Technical Reports Server (NTRS)
1994-01-01
CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.
NASA Astrophysics Data System (ADS)
Ghaemi, Z.; Farnaghi, M.; Alimohammadi, A.
2015-12-01
The critical impact of air pollution on human health and environment in one hand and the complexity of pollutant concentration behavior in the other hand lead the scientists to look for advance techniques for monitoring and predicting the urban air quality. Additionally, recent developments in data measurement techniques have led to collection of various types of data about air quality. Such data is extremely voluminous and to be useful it must be processed at high velocity. Due to the complexity of big data analysis especially for dynamic applications, online forecasting of pollutant concentration trends within a reasonable processing time is still an open problem. The purpose of this paper is to present an online forecasting approach based on Support Vector Machine (SVM) to predict the air quality one day in advance. In order to overcome the computational requirements for large-scale data analysis, distributed computing based on the Hadoop platform has been employed to leverage the processing power of multiple processing units. The MapReduce programming model is adopted for massive parallel processing in this study. Based on the online algorithm and Hadoop framework, an online forecasting system is designed to predict the air pollution of Tehran for the next 24 hours. The results have been assessed on the basis of Processing Time and Efficiency. Quite accurate predictions of air pollutant indicator levels within an acceptable processing time prove that the presented approach is very suitable to tackle large scale air pollution prediction problems.
NASA Technical Reports Server (NTRS)
Nagle, Gail; Masotto, Thomas; Alger, Linda
1990-01-01
The need to meet the stringent performance and reliability requirements of advanced avionics systems has frequently led to implementations which are tailored to a specific application and are therefore difficult to modify or extend. Furthermore, many integrated flight critical systems are input/output intensive. By using a design methodology which customizes the input/output mechanism for each new application, the cost of implementing new systems becomes prohibitively expensive. One solution to this dilemma is to design computer systems and input/output subsystems which are general purpose, but which can be easily configured to support the needs of a specific application. The Advanced Information Processing System (AIPS), currently under development has these characteristics. The design and implementation of the prototype I/O communication system for AIPS is described. AIPS addresses reliability issues related to data communications by the use of reconfigurable I/O networks. When a fault or damage event occurs, communication is restored to functioning parts of the network and the failed or damage components are isolated. Performance issues are addressed by using a parallelized computer architecture which decouples Input/Output (I/O) redundancy management and I/O processing from the computational stream of an application. The autonomous nature of the system derives from the highly automated and independent manner in which I/O transactions are conducted for the application as well as from the fact that the hardware redundancy management is entirely transparent to the application.
Shcherbakov, Alexandre S; Arellanes, Adan Omar
2017-04-20
We present a principally new acousto-optical cell providing an advanced wideband spectrum analysis of ultra-high frequency radio-wave signals. For the first time, we apply a recently developed approach with the tilt angle to a one-phonon non-collinear anomalous light scattering. In contrast to earlier cases, now one can exploit a regime with the fixed optical wavelength for processing a great number of acoustic frequencies simultaneously in the linear regime. The chosen rutile-crystal combines a moderate acoustic velocity with low acoustic attenuation and allows us wide-band data processing within GHz-frequency acoustic waves. We have created and experimentally tested a 6-cm aperture rutile-made acousto-optical cell providing the central frequency 2.0 GHz, frequency bandwidth ∼0.52 GHz with the frequency resolution about 68.3 kHz, and ∼7620 resolvable spots. A similar cell permits designing an advanced ultra-high-frequency arm within a recently developed multi-band radio-wave acousto-optical spectrometer for astrophysical studies. This spectrometer is intended to operate with a few parallel optical arms for processing the multi-frequency data flows within astrophysical observations. Keeping all the instrument's advantages of the previous schematic arrangement, now one can create the highest-frequency arm using the developed rutile-based acousto-optical cell. It permits optimizing the performances inherent in that arm via regulation of both the central frequency and the frequency bandwidth for spectrum analysis.
Search asymmetries: parallel processing of uncertain sensory information.
Vincent, Benjamin T
2011-08-01
What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-09
... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...
Double Take: Parallel Processing by the Cerebral Hemispheres Reduces Attentional Blink
ERIC Educational Resources Information Center
Scalf, Paige E.; Banich, Marie T.; Kramer, Arthur F.; Narechania, Kunjan; Simon, Clarissa D.
2007-01-01
Recent data have shown that parallel processing by the cerebral hemispheres can expand the capacity of visual working memory for spatial locations (J. F. Delvenne, 2005) and attentional tracking (G. A. Alvarez & P. Cavanagh, 2005). Evidence that parallel processing by the cerebral hemispheres can improve item identification has remained elusive.…
Research in Structures and Dynamics, 1984
NASA Technical Reports Server (NTRS)
Hayduk, R. J. (Compiler); Noor, A. K. (Compiler)
1984-01-01
A symposium on advanced and trends in structures and dynamics was held to communicate new insights into physical behavior and to identify trends in the solution procedures for structures and dynamics problems. Pertinent areas of concern were (1) multiprocessors, parallel computation, and database management systems, (2) advances in finite element technology, (3) interactive computing and optimization, (4) mechanics of materials, (5) structural stability, (6) dynamic response of structures, and (7) advanced computer applications.
Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D
2015-12-01
Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Narciso, Steven J.
2011-08-01
An emerging test and measurement standard called AXIe, AdvancedTCA extensions for Instrumentation, is expected to find wide acceptance within the Physics community as it offers many benefits to applications including shock, plasma, particle and nuclear physics. It is expected that many COTS (commercial off-the-shelf) signal conditioning, acquisition and processing modules will become available from a range of different suppliers. AXIe uses AdvancedTCA® as its basis, but then levers test and measurement industry standards such as PXI, IVI, and LXI to facilitate cooperation and plug-and-play interoperability between COTS instrument suppliers. AXIe's large board footprint and power allows high density in a 19" rack, enabling the development of high-performance signal conditioning, analog-to-digital conversion, and data processing, while offering channel count scalability inherent in modular systems. Synchronization between modules is flexible and provided by two triggering structures: a parallel trigger bus, and radially-distributed, time-matched point-to-point trigger lines. Inter-module communication is also provided with an adjacent module local bus allowing data transfer to 600 Gbits/s in each direction, for example between a front-end digitizer and DSP. AXIe allows embedding high performance computing and a range of COTS AdvancedTCA® computer blades are currently available that provide low cost alternatives to the development of custom signal processing modules. The availability of both LAN and PCI Express allow interconnection between modules, as well as industry-standard high-performance data paths to external host computer systems. AXIe delivers a powerful environment for custom module devel opment. As in the case of VXIbus and PXI before it, commercial development kits are expected to be available. This paper will give an overview of the architectural elements of AXIe 1.0, the compatibility model with AdvancedTCA, and signal acquisition performance of many of the AXIe structures.
Graphical Representation of Parallel Algorithmic Processes
1990-12-01
interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malony, Allen D; Shende, Sameer
The primary goal of the University of Oregon's DOE "ÃÂcompetitiveness" project was to create performance technology that embodies and supports knowledge of performance data, analysis, and diagnosis in parallel performance problem solving. The target of our development activities was the TAU Performance System and the technology accomplishments reported in this and prior reports have all been incorporated in the TAU open software distribution. In addition, the project has been committed to maintaining strong interactions with the DOE SciDAC Performance Engineering Research Institute (PERI) and Center for Technology for Advanced Scientific Component Software (TASCS). This collaboration has proved valuable for translationmore » of our knowledge-based performance techniques to parallel application development and performance engineering practice. Our outreach has also extended to the DOE Advanced CompuTational Software (ACTS) collection and project. Throughout the project we have participated in the PERI and TASCS meetings, as well as the ACTS annual workshops.« less
National Laboratory for Advanced Scientific Visualization at UNAM - Mexico
NASA Astrophysics Data System (ADS)
Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo
2016-04-01
In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires large quantity of memory as well as large and fast parallel storage systems. The entire system temperature is controlled by an energy and space efficient cooling solution, based on large rear door liquid cooled heat exchangers. This state-of-the-art infrastructure will boost research activities in the region, offer a powerful scientific tool for teaching at undergraduate and graduate levels, and enhance association and cooperation with business-oriented organizations.
NASA Astrophysics Data System (ADS)
Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.
2017-12-01
Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
Flores, Cintia; Ventura, Francesc; Martin-Alonso, Jordi; Caixach, Josep
2013-09-01
Perfluorooctane sulfonate (PFOS) and perfluorooctanoate (PFOA) are two emerging contaminants that have been detected in all environmental compartments. However, while most of the studies in the literature deal with their presence or removal in wastewater treatment, few of them are devoted to their detection in treated drinking water and fate during drinking water treatment. In this study, analyses of PFOS and PFOA have been carried out in river water samples and in the different stages of a drinking water treatment plant (DWTP) which has recently improved its conventional treatment process by adding ultrafiltration and reverse osmosis in a parallel treatment line. Conventional and advanced treatments have been studied in several pilot plants and in the DWTP, which offers the opportunity to compare both treatments operating simultaneously. From the results obtained, neither preoxidation, sand filtration, nor ozonation, removed both perfluorinated compounds. As advanced treatments, reverse osmosis has proved more effective than reverse electrodialysis to remove PFOA and PFOS in the different configurations of pilot plants assayed. Granular activated carbon with an average elimination efficiency of 64±11% and 45±19% for PFOS and PFOA, respectively and especially reverse osmosis, which was able to remove ≥99% of both compounds, were the sole effective treatment steps. Trace levels of PFOS (3.0-21 ng/L) and PFOA (<4.2-5.5 ng/L) detected in treated drinking water were significantly lowered in comparison to those measured in precedent years. These concentrations represent overall removal efficiencies of 89±22% for PFOA and 86±7% for PFOS. Copyright © 2013 Elsevier B.V. All rights reserved.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...
2017-01-28
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Compton Scattering Cross Sections in Strong Magnetic Fields: Advances for Neutron Star Applications
NASA Astrophysics Data System (ADS)
Eiles, Matthew; Gonthier, P. L.; Baring, M. G.; Wadiasingh, Z.
2013-04-01
Various telescopes including RXTE, INTEGRAL and Suzaku have detected non-thermal X-ray emission in the 10 - 200 keV band from strongly magnetic neutron stars. Inverse Compton scattering, a quantum-electrodynamical process, is believed to be a leading candidate for the production of this intense X-ray radiation. Magnetospheric conditions are such that electrons may well possess ultra-relativistic energies, which lead to attractive simplifications of the cross section. We have recently addressed such a case by developing compact analytic expressions using correct spin-dependent widths and Sokolov & Ternov (ST) basis states, focusing specifically on ground state-to-ground state scattering. However, inverse Compton scattering can cool electrons down to mildly-relativistic energies, necessitating the development of a more general case where the incoming photons acquire nonzero incident angles relative to the field in the rest frame of the electron, and the intermediate state can be excited to arbitrary Landau levels. In this paper, we develop results pertaining to this general case using ST formalism, and treating the plethora of harmonic resonances associated with various cyclotron transitions between Landau states. Four possible scattering modes (parallel-parallel, perpendicular-perpendicular, parallel-perpendicular, and perpendicular-parallel) encapsulate the polarization dependence of the cross section. We present preliminary analytic and numerical investigations of the magnitude of the extra Landau state contributions to obtain the full cross section, and compare these new analytic developments with the spin-averaged cross sections, which we develop in parallel. Results will find application to various neutron star problems, including computation of Eddington luminosities in the magnetospheres of magnetars. We express our gratitude for the generous support of the Michigan Space Grant Consortium, of the National Science Foundation (REU and RUI), and the NASA Astrophysics Theory and Fundamental Program.
Mobile Ultrasound Plane Wave Beamforming on iPhone or iPad using Metal- based GPU Processing
NASA Astrophysics Data System (ADS)
Hewener, Holger J.; Tretbar, Steffen H.
Mobile and cost effective ultrasound devices are being used in point of care scenarios or the drama room. To reduce the costs of such devices we already presented the possibilities of consumer devices like the Apple iPad for full signal processing of raw data for ultrasound image generation. Using technologies like plane wave imaging to generate a full image with only one excitation/reception event the acquisition times and power consumption of ultrasound imaging can be reduced for low power mobile devices based on consumer electronics realizing the transition from FPGA or ASIC based beamforming into more flexible software beamforming. The massive parallel beamforming processing can be done with the Apple framework "Metal" for advanced graphics and general purpose GPU processing for the iOS platform. We were able to integrate the beamforming reconstruction into our mobile ultrasound processing application with imaging rates up to 70 Hz on iPad Air 2 hardware.
Competitive Genomic Screens of Barcoded Yeast Libraries
Urbanus, Malene; Proctor, Michael; Heisler, Lawrence E.; Giaever, Guri; Nislow, Corey
2011-01-01
By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment. PMID:21860376
A Review of Biorefinery Separations for Bioproduct Production via Thermocatalytic Processing.
Nguyen, Hannah; DeJaco, Robert F; Mittal, Nitish; Siepmann, J Ilja; Tsapatsis, Michael; Snyder, Mark A; Fan, Wei; Saha, Basudeb; Vlachos, Dionisios G
2017-06-07
With technological advancement of thermocatalytic processes for valorizing renewable biomass carbon, development of effective separation technologies for selective recovery of bioproducts from complex reaction media and their purification becomes essential. The high thermal sensitivity of biomass intermediates and their low volatility and high reactivity, along with the use of dilute solutions, make the bioproducts separations energy intensive and expensive. Novel separation techniques, including solvent extraction in biphasic systems and reactive adsorption using zeolite and carbon sorbents, membranes, and chromatography, have been developed. In parallel with experimental efforts, multiscale simulations have been reported for predicting solvent selection and adsorption separation. We discuss various separations that are potentially valuable to future biorefineries and the factors controlling separation performance. Particular emphasis is given to current gaps and opportunities for future development.
A status of the Turbine Technology Team activities
NASA Technical Reports Server (NTRS)
Griffin, Lisa W.
1992-01-01
The recent activities of the Turbine Technology Team of the Consortium for Computational Fluid Dynamics (CFD) Application in Propulsion Technology is presented. The team consists of members from the government, industry, and universities. The goal of this team is to demonstrate the benefits to the turbine design process attainable through the application of CFD. This goal is to be achieved by enhancing and validating turbine design tools for improved loading and flowfield definition and loss prediction, and transferring the advanced technology to the turbine design process. In order to demonstrate the advantages of using CFD early in the design phase, the Space Transportation Main Engine (STME) turbines for the National Launch System (NLS) were chosen on which to focus the team's efforts. The Turbine Team activities run parallel to the STME design work.
Fast parallel algorithm for slicing STL based on pipeline
NASA Astrophysics Data System (ADS)
Ma, Xulong; Lin, Feng; Yao, Bo
2016-05-01
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Mobile Devices and GPU Parallelism in Ionospheric Data Processing
NASA Astrophysics Data System (ADS)
Mascharka, D.; Pankratius, V.
2015-12-01
Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of smartphones.
Vigmond, Edward J.; Boyle, Patrick M.; Leon, L. Joshua; Plank, Gernot
2014-01-01
Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time. PMID:19964295
Packet telemetry and packet telecommand - The new generation of spacecraft data handling techniques
NASA Technical Reports Server (NTRS)
Hooke, A. J.
1983-01-01
Because of rising costs and reduced reliability of spacecraft and ground network hardware and software customization, standardization Packet Telemetry and Packet Telecommand concepts are emerging as viable alternatives. Autonomous packets of data, within each concept, which are created within ground and space application processes through the use of formatting techniques, are switched end-to-end through the space data network to their destination application processes through the use of standard transfer protocols. This process may result in facilitating a high degree of automation and interoperability because of completely mission-independent-designed intermediate data networks. The adoption of an international guideline for future space telemetry formatting of the Packet Telemetry concept, and the advancement of the NASA-ESA Working Group's Packet Telecommand concept to a level of maturity parallel to the of Packet Telemetry are the goals of the Consultative Committee for Space Data Systems. Both the Packet Telemetry and Packet Telecommand concepts are reviewed.
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Distributed computing feasibility in a non-dedicated homogeneous distributed system
NASA Technical Reports Server (NTRS)
Leutenegger, Scott T.; Sun, Xian-He
1993-01-01
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Simplified Parallel Domain Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers
ERIC Educational Resources Information Center
Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph
2015-01-01
In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…
Prostate Cancer Detection by Molecular Urinalysis
2011-04-01
12) and in part by the NIH COBRE award 1 P20 RP15563, and matching support from the State of Kansas. We thank David Matthews, M.D., of Charlotte, NC...focused on detecting such molecular changes in the urine or EPF [7-12,15]. Paralleling the advances in biomarker discovery , sig- nificant advances in
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
This is a fact sheet on the U.S. Department of Energy's (DOE) Advanced Reciprocating Engine Systems program (ARES), which is designed to promote separate, but parallel engine development between the major stationary, gaseous fueled engine manufacturers in the United States.
ERIC Educational Resources Information Center
Abuzaghleh, Omar; Goldschmidt, Kathleen; Elleithy, Yasser; Lee, Jeongkyu
2013-01-01
With the advances in computing power, high-performance computing (HPC) platforms have had an impact on not only scientific research in advanced organizations but also computer science curriculum in the educational community. For example, multicore programming and parallel systems are highly desired courses in the computer science major. However,…
Science and Mathematics Advanced Placement Exams: Growth and Achievement over Time
ERIC Educational Resources Information Center
Judson, Eugene
2017-01-01
Rapid growth of Advanced Placement (AP) exams in the last 2 decades has been paralleled by national enthusiasm to promote availability and rigor of science, technology, engineering, and mathematics (STEM). Trends were examined in STEM AP to evaluate and compare growth and achievement. Analysis included individual STEM subjects and disaggregation…
Prototype architecture for a VLSI level zero processing system. [Space Station Freedom
NASA Technical Reports Server (NTRS)
Shi, Jianfei; Grebowsky, Gerald J.; Horner, Ward P.; Chesney, James R.
1989-01-01
The prototype architecture and implementation of a high-speed level zero processing (LZP) system are discussed. Due to the new processing algorithm and VLSI technology, the prototype LZP system features compact size, low cost, high processing throughput, and easy maintainability and increased reliability. Though extensive control functions have been done by hardware, the programmability of processing tasks makes it possible to adapt the system to different data formats and processing requirements. It is noted that the LZP system can handle up to 8 virtual channels and 24 sources with combined data volume of 15 Gbytes per orbit. For greater demands, multiple LZP systems can be configured in parallel, each called a processing channel and assigned a subset of virtual channels. The telemetry data stream will be steered into different processing channels in accordance with their virtual channel IDs. This super system can cope with a virtually unlimited number of virtual channels and sources. In the near future, it is expected that new disk farms with data rate exceeding 150 Mbps will be available from commercial vendors due to the advance in disk drive technology.
Comparing the Performance of Two Dynamic Load Distribution Methods
NASA Technical Reports Server (NTRS)
Kale, L. V.
1987-01-01
Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.
Liquid-phase deposition of thin Si films by ballistic electro-reduction
NASA Astrophysics Data System (ADS)
Ohta, T.; Gelloz, B.; Kojima, A.; Koshida, N.
2013-01-01
It is shown that the nanocryatalline silicon ballistic electron emitter operates in a SiCl4 solution without using any counter electrodes and that thin amorphous Si films are efficiently deposited on the emitting surface with no contaminations and by-products. Despite the large electrochemical window of the SiCl4 solution, electrons injected with sufficiently high energies preferentially reduce Si4+ ions at the interface. Using an emitter with patterned line emission windows, a Si-wires array can be formed in parallel. This low-temperature liquid-phase deposition technique provides an alternative clean process for power-effective fabrication of advanced thin Si film structures and devices.
MODEST: A Tool for Geodesy and Astronomy
NASA Technical Reports Server (NTRS)
Sovers, Ojars J.; Jacobs, Christopher S.; Lanyi, Gabor E.
2004-01-01
Features of the JPL VLBI modeling and estimation software "MODEST" are reviewed. Its main advantages include thoroughly documented model physics, portability, and detailed error modeling. Two unique models are included: modeling of source structure and modeling of both spatial and temporal correlations in tropospheric delay noise. History of the code parallels the development of the astrometric and geodetic VLBI technique and the software retains many of the models implemented during its advancement. The code has been traceably maintained since the early 1980s, and will continue to be updated with recent IERS standards. Scripts are being developed to facilitate user-friendly data processing in the era of e-VLBI.
NASA Astrophysics Data System (ADS)
Czermak, A.; Zalewska, A.; Dulny, B.; Sowicki, B.; Jastrząb, M.; Nowak, L.
2004-07-01
The needs for real time monitoring of the hadrontherapy beam intensity and profile as well as requirements for the fast dosimetry using Monolithic Active Pixel Sensors (MAPS) forced the SUCIMA collaboration to the design of the unique Data Acquisition System (DAQ SUCIMA Imager). The DAQ system has been developed on one of the most advanced XILINX Field Programmable Gate Array chip - VERTEX II. The dedicated multifunctional electronic board for the detector's analogue signals capture, their parallel digital processing and final data compression as well as transmission through the high speed USB 2.0 port has been prototyped and tested.
Research Studies on Advanced Optical Module/Head Designs for Optical Data Storage
NASA Technical Reports Server (NTRS)
1992-01-01
Preprints are presented from the recent 1992 Optical Data Storage meeting in San Jose. The papers are divided into the following topical areas: Magneto-optical media (Modeling/design and fabrication/characterization/testing); Optical heads (holographic optical elements); and Optical heads (integrated optics). Some representative titles are as follow: Diffraction analysis and evaluation of several focus and track error detection schemes for magneto-optical disk systems; Proposal for massively parallel data storage system; Transfer function characteristics of super resolving systems; Modeling and measurement of a micro-optic beam deflector; Oxidation processes in magneto-optic and related materials; and A modal analysis of lamellar diffraction gratings in conical mountings.
NASA Astrophysics Data System (ADS)
Efron, Uzi
Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
NASA Technical Reports Server (NTRS)
Efron, Uzi (Editor)
1990-01-01
Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
Parallel processing via a dual olfactory pathway in the honeybee.
Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang
2013-02-06
In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.
Advances in High-Fidelity Multi-Physics Simulation Techniques
2008-01-01
predictor - corrector method is used to advance the solution in time. 33 x (m) y (m ) 0 1 2 3.00001 0 1 2 3 4 5 40 x 50 Grid 3 Figure 17: Typical...Unclassified c . THIS PAGE Unclassified 17. LIMITATION OF ABSTRACT: SAR 18. NUMBER OF PAGES 60 Datta Gaitonde 19b. TELEPHONE...advanced parallel computing platforms. The motivation to develop high-fidelity algorithms derives from considerations in various areas of current
Reliability of a Parallel Pipe Network
NASA Technical Reports Server (NTRS)
Herrera, Edgar; Chamis, Christopher (Technical Monitor)
2001-01-01
The goal of this NASA-funded research is to advance research and education objectives in theoretical and computational probabilistic structural analysis, reliability, and life prediction methods for improved aerospace and aircraft propulsion system components. Reliability methods are used to quantify response uncertainties due to inherent uncertainties in design variables. In this report, several reliability methods are applied to a parallel pipe network. The observed responses are the head delivered by a main pump and the head values of two parallel lines at certain flow rates. The probability that the flow rates in the lines will be less than their specified minimums will be discussed.
Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.
Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul
2009-02-15
The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
Visual analysis of inter-process communication for large-scale parallel computing.
Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu
2009-01-01
In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Recent progress in the imaging of soil processes at the microscopic scale, and a look ahead
NASA Astrophysics Data System (ADS)
Garnier, Patricia; Baveye, Philippe C.; Pot, Valérie; Monga, Olivier; Portell, Xavier
2016-04-01
Over the last few years, tremendous progress has been achieved in the visualization of soil structures at the microscopic scale. Computed tomography, based on synchrotron X-ray beams or table-top equipment, allows the visualization of pore geometry at micrometric resolution. Chemical and microbiological information obtainable in 2D cuts through soils can now be interpolated, with the support of CT-data, to produce 3-dimensional maps. In parallel with these analytical advances, significant progress has also been achieved in the computer simulation and visualization of a range of physical, chemical, and microbiological processes taking place in soil pores. In terms of water distribution and transport in soils, for example, the use of Lattice-Boltzmann models as well as models based on geometric primitives has been shown recently to reproduce very faithfully observations made with synchrotron X-ray tomography. Coupling of these models with fungal and bacterial growth models allows the description of a range of microbiologically-mediated processes of great importance at the moment, for example in terms of carbon sequestration. In this talk, we shall review progress achieved to date in this field, indicate where questions remain unanswered, and point out areas where further advances are expected in the next few years.
NASA Astrophysics Data System (ADS)
Canavesi, Cristina; Cogliati, Andrea; Hayes, Adam; Tankam, Patrice; Santhanam, Anand; Rolland, Jannick P.
2017-02-01
Real-time volumetric high-definition wide-field-of-view in-vivo cellular imaging requires micron-scale resolution in 3D. Compactness of the handheld device and distortion-free images with cellular resolution are also critically required for onsite use in clinical applications. By integrating a custom liquid lens-based microscope and a dual-axis MEMS scanner in a compact handheld probe, Gabor-domain optical coherence microscopy (GD-OCM) breaks the lateral resolution limit of optical coherence tomography through depth, overcoming the tradeoff between numerical aperture and depth of focus, enabling advances in biotechnology. Furthermore, distortion-free imaging with no post-processing is achieved with a compact, lightweight handheld MEMS scanner that obtained a 12-fold reduction in volume and 17-fold reduction in weight over a previous dual-mirror galvanometer-based scanner. Approaching the holy grail of medical imaging - noninvasive real-time imaging with histologic resolution - GD-OCM demonstrates invariant resolution of 2 μm throughout a volume of 1 x 1 x 0.6 mm3, acquired and visualized in less than 2 minutes with parallel processing on graphics processing units. Results on the metrology of manufactured materials and imaging of human tissue with GD-OCM are presented.
Cardiac imaging: working towards fully-automated machine analysis & interpretation
Slomka, Piotr J; Dey, Damini; Sitek, Arkadiusz; Motwani, Manish; Berman, Daniel S; Germano, Guido
2017-01-01
Introduction Non-invasive imaging plays a critical role in managing patients with cardiovascular disease. Although subjective visual interpretation remains the clinical mainstay, quantitative analysis facilitates objective, evidence-based management, and advances in clinical research. This has driven developments in computing and software tools aimed at achieving fully automated image processing and quantitative analysis. In parallel, machine learning techniques have been used to rapidly integrate large amounts of clinical and quantitative imaging data to provide highly personalized individual patient-based conclusions. Areas covered This review summarizes recent advances in automated quantitative imaging in cardiology and describes the latest techniques which incorporate machine learning principles. The review focuses on the cardiac imaging techniques which are in wide clinical use. It also discusses key issues and obstacles for these tools to become utilized in mainstream clinical practice. Expert commentary Fully-automated processing and high-level computer interpretation of cardiac imaging are becoming a reality. Application of machine learning to the vast amounts of quantitative data generated per scan and integration with clinical data also facilitates a move to more patient-specific interpretation. These developments are unlikely to replace interpreting physicians but will provide them with highly accurate tools to detect disease, risk-stratify, and optimize patient-specific treatment. However, with each technological advance, we move further from human dependence and closer to fully-automated machine interpretation. PMID:28277804
Indicator system for advanced nuclear plant control complex
Scarola, Kenneth; Jamison, David S.; Manazir, Richard M.; Rescorl, Robert L.; Harmon, Daryl L.
1993-01-01
An advanced control room complex for a nuclear power plant, including a discrete indicator and alarm system (72) which is nuclear qualified for rapid response to changes in plant parameters and a component control system (64) which together provide a discrete monitoring and control capability at a panel (14-22, 26, 28) in the control room (10). A separate data processing system (70), which need not be nuclear qualified, provides integrated and overview information to the control room and to each panel, through CRTs (84) and a large, overhead integrated process status overview board (24). The discrete indicator and alarm system (72) and the data processing system (70) receive inputs from common plant sensors and validate the sensor outputs to arrive at a representative value of the parameter for use by the operator during both normal and accident conditions, thereby avoiding the need for him to assimilate data from each sensor individually. The integrated process status board (24) is at the apex of an information hierarchy that extends through four levels and provides access at each panel to the full display hierarchy. The control room panels are preferably of a modular construction, permitting the definition of inputs and outputs, the man machine interface, and the plant specific algorithms, to proceed in parallel with the fabrication of the panels, the installation of the equipment and the generic testing thereof.
Advanced nuclear plant control complex
Scarola, Kenneth; Jamison, David S.; Manazir, Richard M.; Rescorl, Robert L.; Harmon, Daryl L.
1993-01-01
An advanced control room complex for a nuclear power plant, including a discrete indicator and alarm system (72) which is nuclear qualified for rapid response to changes in plant parameters and a component control system (64) which together provide a discrete monitoring and control capability at a panel (14-22, 26, 28) in the control room (10). A separate data processing system (70), which need not be nuclear qualified, provides integrated and overview information to the control room and to each panel, through CRTs (84) and a large, overhead integrated process status overview board (24). The discrete indicator and alarm system (72) and the data processing system (70) receive inputs from common plant sensors and validate the sensor outputs to arrive at a representative value of the parameter for use by the operator during both normal and accident conditions, thereby avoiding the need for him to assimilate data from each sensor individually. The integrated process status board (24) is at the apex of an information hierarchy that extends through four levels and provides access at each panel to the full display hierarchy. The control room panels are preferably of a modular construction, permitting the definition of inputs and outputs, the man machine interface, and the plant specific algorithms, to proceed in parallel with the fabrication of the panels, the installation of the equipment and the generic testing thereof.
Advanced nuclear plant control room complex
Scarola, Kenneth; Jamison, David S.; Manazir, Richard M.; Rescorl, Robert L.; Harmon, Daryl L.
1993-01-01
An advanced control room complex for a nuclear power plant, including a discrete indicator and alarm system (72) which is nuclear qualified for rapid response to changes in plant parameters and a component control system (64) which together provide a discrete monitoring and control capability at a panel (14-22, 26, 28) in the control room (10). A separate data processing system (70), which need not be nuclear qualified, provides integrated and overview information to the control room and to each panel, through CRTs (84) and a large, overhead integrated process status overview board (24). The discrete indicator and alarm system (72) and the data processing system (70) receive inputs from common plant sensors and validate the sensor outputs to arrive at a representative value of the parameter for use by the operator during both normal and accident conditions, thereby avoiding the need for him to assimilate data from each sensor individually. The integrated process status board (24) is at the apex of an information hierarchy that extends through four levels and provides access at each panel to the full display hierarchy. The control room panels are preferably of a modular construction, permitting the definition of inputs and outputs, the man machine interface, and the plant specific algorithms, to proceed in parallel with the fabrication of the panels, the installation of the equipment and the generic testing thereof.
Giannakis, Stefanos; Voumard, Margaux; Grandjean, Dominique; Magnet, Anoys; De Alencastro, Luiz Felippe; Pulgarin, César
2016-10-01
In this work, disinfection by 5 Advanced Oxidation Processes was preceded by 3 different secondary treatment systems present in the wastewater treatment plant of Vidy, Lausanne (Switzerland). 5 AOPs after two biological treatment methods (conventional activated sludge and moving bed bioreactor) and a physiochemical process (coagulation-flocculation) were tested in laboratory scale. The dependence among AOPs efficiency and secondary (pre)treatment was estimated by following the bacterial concentration i) before secondary treatment, ii) after the different secondary treatment methods and iii) after the various AOPs. Disinfection and post-treatment bacterial regrowth were the evaluation indicators. The order of efficiency was Moving Bed Bioreactor > Activated Sludge > Coagulation-Flocculation > Primary Treatment. As far as the different AOPs are concerned, the disinfection kinetics were: UVC/H2O2 > UVC and solar photo-Fenton > Fenton or solar light. The contextualization and parallel study of microorganisms with the micropollutants of the effluents revealed that higher exposure times were necessary for complete degradation compared to microorganisms for the UV-based processes and inversed for the Fenton-related ones. Nevertheless, in the Fenton-related systems, the nominal 80% removal of micropollutants deriving from the Swiss legislation, often took place before the elimination of bacterial regrowth risk. Copyright © 2016 Elsevier Ltd. All rights reserved.
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Advanced Numerical Techniques of Performance Evaluation. Volume 1
1990-06-01
system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W
2016-01-01
The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.
Klingner, Carsten M.; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W.
2016-01-01
The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI. PMID:28066197
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Opus: A Coordination Language for Multidisciplinary Applications
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Haines, Matthew; Mehrotra, Piyush; Zima, Hans; vanRosendale, John
1997-01-01
Data parallel languages, such as High Performance fortran, can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are multidisciplinary and heterogeneous in nature, and thus do not fit well into the data parallel paradigm. In this paper we present Opus, a language designed to fill this gap. The central concept of Opus is a mechanism called ShareD Abstractions (SDA). An SDA can be used as a computation server, i.e., a locus of computational activity, or as a data repository for sharing data between asynchronous tasks. SDAs can be internally data parallel, providing support for the integration of data and task parallelism as well as nested task parallelism. They can thus be used to express multidisciplinary applications in a natural and efficient way. In this paper we describe the features of the language through a series of examples and give an overview of the runtime support required to implement these concepts in parallel and distributed environments.
GPU based framework for geospatial analyses
NASA Astrophysics Data System (ADS)
Cosmin Sandric, Ionut; Ionita, Cristian; Dardala, Marian; Furtuna, Titus
2017-04-01
Parallel processing on multiple CPU cores is already used at large scale in geocomputing, but parallel processing on graphics cards is just at the beginning. Being able to use an simple laptop with a dedicated graphics card for advanced and very fast geocomputation is an advantage that each scientist wants to have. The necessity to have high speed computation in geosciences has increased in the last 10 years, mostly due to the increase in the available datasets. These datasets are becoming more and more detailed and hence they require more space to store and more time to process. Distributed computation on multicore CPU's and GPU's plays an important role by processing one by one small parts from these big datasets. These way of computations allows to speed up the process, because instead of using just one process for each dataset, the user can use all the cores from a CPU or up to hundreds of cores from GPU The framework provide to the end user a standalone tools for morphometry analyses at multiscale level. An important part of the framework is dedicated to uncertainty propagation in geospatial analyses. The uncertainty may come from the data collection or may be induced by the model or may have an infinite sources. These uncertainties plays important roles when a spatial delineation of the phenomena is modelled. Uncertainty propagation is implemented inside the GPU framework using Monte Carlo simulations. The GPU framework with the standalone tools proved to be a reliable tool for modelling complex natural phenomena The framework is based on NVidia Cuda technology and is written in C++ programming language. The code source will be available on github at https://github.com/sandricionut/GeoRsGPU Acknowledgement: GPU framework for geospatial analysis, Young Researchers Grant (ICUB-University of Bucharest) 2016, director Ionut Sandric
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit
NASA Technical Reports Server (NTRS)
Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;
1998-01-01
Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.
A software architecture for multidisciplinary applications: Integrating task and data parallelism
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans
1994-01-01
Data parallel languages such as Vienna Fortran and HPF can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are of a multidisciplinary and heterogeneous nature and thus do not fit well into the data parallel paradigm. In this paper we present new Fortran 90 language extensions to fill this gap. Tasks can be spawned as asynchronous activities in a homogeneous or heterogeneous computing environment; they interact by sharing access to Shared Data Abstractions (SDA's). SDA's are an extension of Fortran 90 modules, representing a pool of common data, together with a set of Methods for controlled access to these data and a mechanism for providing persistent storage. Our language supports the integration of data and task parallelism as well as nested task parallelism and thus can be used to express multidisciplinary applications in a natural and efficient way.
A survey of GPU-based medical image computing techniques
Shi, Lin; Liu, Wen; Zhang, Heye; Xie, Yongming
2012-01-01
Medical imaging currently plays a crucial role throughout the entire clinical applications from medical scientific research to diagnostics and treatment planning. However, medical imaging procedures are often computationally demanding due to the large three-dimensional (3D) medical datasets to process in practical clinical applications. With the rapidly enhancing performances of graphics processors, improved programming support, and excellent price-to-performance ratio, the graphics processing unit (GPU) has emerged as a competitive parallel computing platform for computationally expensive and demanding tasks in a wide range of medical image applications. The major purpose of this survey is to provide a comprehensive reference source for the starters or researchers involved in GPU-based medical image processing. Within this survey, the continuous advancement of GPU computing is reviewed and the existing traditional applications in three areas of medical image processing, namely, segmentation, registration and visualization, are surveyed. The potential advantages and associated challenges of current GPU-based medical imaging are also discussed to inspire future applications in medicine. PMID:23256080
Sung, Kyongje
2008-12-01
Participants searched a visual display for a target among distractors. Each of 3 experiments tested a condition proposed to require attention and for which certain models propose a serial search. Serial versus parallel processing was tested by examining effects on response time means and cumulative distribution functions. In 2 conditions, the results suggested parallel rather than serial processing, even though the tasks produced significant set-size effects. Serial processing was produced only in a condition with a difficult discrimination and a very large set-size effect. The results support C. Bundesen's (1990) claim that an extreme set-size effect leads to serial processing. Implications for parallel models of visual selection are discussed.
FBIS report. Science and technology: Europe/International, March 29, 1996
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1996-03-29
;Partial Contents: Advanced Materials (EU Project to Improve Production in Metal Matrix Compounds Noted, Germany: Extremely Hard Carbon Coating Development, Italy: Director of CNR Metallic Materials Institute Interviewed); Aerospace (ESA Considers Delays, Reductions as Result of Budget Cuts, Italy: Space Agency`s Director on Restructuring, Future Plans); Automotive, Transportation (EU: Clean Diesel Engine Technology Research Reviewed); Biotechnology (Germany`s Problems, Successes in Biotechnology Discussed); Computers (EU Europort Parallel Computing Project Concluded, Italy: PQE 2000 Project on Massively Parallel Systems Viewed); Defense R&D (France: Future Tasks of `Brevel` Military Intelligence Drone Noted); Energy, Environment (German Scientist Tests Elimination of Phosphates); Advanced Manufacturing (France:more » Advanced Rapid Prototyping System Presented); Lasers, Sensors, Optics (France: Strategy of Cilas Laser Company Detailed); Microelectronics (France: Simulation Company to Develop Microelectronic Manufacturing Application); Nuclear R&D (France: Megajoule Laser Plan, Cooperation with Livermore Lab Noted); S&T Policy (EU Efforts to Aid Small Companies` Research Viewed); Telecommunications (France Telecom`s Way to Internet).« less
How challenges in auditory fMRI led to general advancements for the field.
Talavage, Thomas M; Hall, Deborah A
2012-08-15
In the early years of fMRI research, the auditory neuroscience community sought to expand its knowledge of the underlying physiology of hearing, while also seeking to come to grips with the inherent acoustic disadvantages of working in the fMRI environment. Early collaborative efforts between prominent auditory research laboratories and prominent fMRI centers led to development of a number of key technical advances that have subsequently been widely used to elucidate principles of auditory neurophysiology. Perhaps the key imaging advance was the simultaneous and parallel development of strategies to use pulse sequences in which the volume acquisitions were "clustered," providing gaps in which stimuli could be presented without direct masking. Such sequences have become widespread in fMRI studies using auditory stimuli and also in a range of translational research domains. This review presents the parallel stories of the people and the auditory neurophysiology research that led to these sequences. Copyright © 2011 Elsevier Inc. All rights reserved.
Fujita, Miki; Wasteneys, Geoffrey O
2014-05-01
Cellulose microfibrils are critical for plant cell specialization and function. Recent advances in live cell imaging of fluorescently tagged cellulose synthases to track cellulose synthesis have greatly advanced our understanding of cellulose biosynthesis. Nevertheless, cellulose deposition patterns remain poorly described in many cell types, including those in the process of division or differentiation. In this study, we used field emission scanning electron microscopy analysis of cryo-planed tissues to determine the arrangement of cellulose microfibrils in various faces of cells undergoing cytokinesis or specialized development, including cell types in which cellulose cannot be imaged by conventional approaches. In dividing cells, we detected microfibrillar meshworks in the cell plates, consistent with the concentration at the cell plate of cellulose synthase complexes, as detected by fluorescently tagged CesA6. We also observed a loss of parallel cellulose microfibril orientation in walls of the mother cell during cytokinesis, which corresponded with the loss of fluorescently tagged cellulose synthase complexes from these surfaces. In recently formed guard cells, microfibrils were randomly organized and only formed a highly ordered circumferential pattern after pore formation. In pit fields, cellulose microfibrils were arranged in circular patterns around plasmodesmata. Microfibrils were random in most cotyledon cells except the epidermis and were parallel to the growth axis in trichomes. Deposition of cellulose microfibrils was spatially delineated in metaxylem and protoxylem cells of the inflorescence stem, supporting recent studies on microtubule exclusion mechanisms.
Petrica, Ligia; Vlad, Mihaela; Vlad, Adrian; Gluhovschi, Gheorghe; Gadalean, Florica; Dumitrascu, Victor; Popescu, Roxana; Gluhovschi, Cristina; Matusz, Petru; Velciov, Silvia; Bob, Flaviu; Ursoniu, Sorin; Vlad, Daliborca
2017-09-01
Detection of podocytes in the urine of patients with type 2 diabetes may indicate severe injury to the podocytes. In the course of type 2 diabetes the proximal tubule is involved in urinary albumin processing. We studied the significance of podocyturia in relation with proximal tubule dysfunction in type 2 diabetes. A total of 86 patients with type 2 diabetes (34-normoalbuminuria; 30-microalbuminuria; 22-macroalbuminuria) and 28 healthy subjects were enrolled in the study and assessed concerning urinary podocytes, podocyte-associated molecules, and biomarkers of proximal tubule dysfunction. Urinary podocytes were examined in cell cultures by utilizing monoclonal antibodies against podocalyxin and synaptopodin. Podocytes were detected in the urine of 10% of the healthy controls, 24% of the normoalbuminuric, 40% of the microalbuminuric, and 82% of the macroalbuminuric patients. In multivariate logistic regression analysis, urinary podocytes correlated with urinary albumin:creatinine ratio (p=0.006), urinary nephrin/creat (p=0.001), urinary vascular endothelial growth factor/creat (p=0.001), urinary kidney injury molecule-1/creat (p=0.003), cystatin C (p=0.001), urinary advanced glycation end-products (p=0.002), eGFR (p=0.001). In patients with type 2 diabetes podocyturia parallels proximal tubule dysfunction independently of albuminuria and renal function decline. Advanced glycation end-products may impact the podocytes and the proximal tubule. Copyright © 2017 Elsevier Inc. All rights reserved.
Methods for design and evaluation of parallel computating systems (The PISCES project)
NASA Technical Reports Server (NTRS)
Pratt, Terrence W.; Wise, Robert; Haught, Mary JO
1989-01-01
The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Implementation and Testing of VLBI Software Correlation at the USNO
NASA Technical Reports Server (NTRS)
Fey, Alan; Ojha, Roopesh; Boboltz, Dave; Geiger, Nicole; Kingham, Kerry; Hall, David; Gaume, Ralph; Johnston, Ken
2010-01-01
The Washington Correlator (WACO) at the U.S. Naval Observatory (USNO) is a dedicated VLBI processor based on dedicated hardware of ASIC design. The WACO is currently over 10 years old and is nearing the end of its expected lifetime. Plans for implementation and testing of software correlation at the USNO are currently being considered. The VLBI correlation process is, by its very nature, well suited to a parallelized computing environment. Commercial off-the-shelf computer hardware has advanced in processing power to the point where software correlation is now both economically and technologically feasible. The advantages of software correlation are manifold but include flexibility, scalability, and easy adaptability to changing environments and requirements. We discuss our experience with and plans for use of software correlation at USNO with emphasis on the use of the DiFX software correlator.
National Combustion Code: A Multidisciplinary Combustor Design System
NASA Technical Reports Server (NTRS)
Stubbs, Robert M.; Liu, Nan-Suey
1997-01-01
The Internal Fluid Mechanics Division conducts both basic research and technology, and system technology research for aerospace propulsion systems components. The research within the division, which is both computational and experimental, is aimed at improving fundamental understanding of flow physics in inlets, ducts, nozzles, turbomachinery, and combustors. This article and the following three articles highlight some of the work accomplished in 1996. A multidisciplinary combustor design system is critical for optimizing the combustor design process. Such a system should include sophisticated computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. The goal of the present effort is to develop some of the enabling technologies and to demonstrate their overall performance in an integrated system called the National Combustion Code.
The future of small molecule inhibitors in lymphoma.
Gerecitano, John
2009-09-01
For the many patients with lymphoma that has relapsed after and/or has become refractory to existing treatments, the development of novel therapeutics is imperative. Investigation into intracellular processes that are dysregulated during lymphomagenesis has uncovered several new potential targets for anticancer agents. Although monoclonal antibodies and other immunotherapeutics have led to dramatic advances in the treatment of patients with lymphoma, the parallel development of small molecule inhibitors has been equally exciting. These agents, whose small size allows direct entry into tumor cells, can target distinct proteins or complexes, thereby disrupting molecular processes on which neoplastic cells depend for survival and growth. This review surveys the published literature on many of these new targeted molecules, focusing on some of the most promising agents for which phase 2 data currently exist. It also explores the potential for incorporating these agents into broader multidrug regimens.
Lugo, Ricardo G; Helkala, Kirsi; Knox, Benjamin J; Jøsok, Øyvind; Lande, Natalie M; Sütterlin, Stefan
2018-01-01
Background Technical advancement in military cyber defense poses increased cognitive demands on cyber officers. In the cyber domain, the influence of emotion on decision-making is rarely investigated. The purpose of this study was to assess psychophysiological correlation with perseverative cognitions during emotionally intensive/stressful situations in cyber military personnel. In line with parallel research on clinical samples high on perseverative cognition, we expected a decreased interoceptive sensitivity in officers with high levels of perseverative cognition. Method We investigated this association in a sample of 27 cyber officer cadets. Results Contrary to our hypothesis, there was no relationship between the factors. Discussion Cyber officers might display characteristics not otherwise found in general populations. The cyber domain may lead to a selection process that attracts different profiles of cognitive and emotional processing. PMID:29296103
Pathways of the Maillard reaction under physiological conditions.
Henning, Christian; Glomb, Marcus A
2016-08-01
Initially investigated as a color formation process in thermally treated foods, nowadays, the relevance of the Maillard reaction in vivo is generally accepted. Many chronic and age-related diseases such as diabetes, uremia, atherosclerosis, cataractogenesis and Alzheimer's disease are associated with Maillard derived advanced glycation endproducts (AGEs) and α-dicarbonyl compounds as their most important precursors in terms of reactivity and abundance. However, the situation in vivo is very challenging, because Maillard chemistry is paralleled by enzymatic reactions which can lead to both, increases and decreases in certain AGEs. In addition, mechanistic findings established under the harsh conditions of food processing might not be valid under physiological conditions. The present review critically discusses the relevant α-dicarbonyl compounds as central intermediates of AGE formation in vivo with a special focus on fragmentation pathways leading to formation of amide-AGEs.
High-contrast imaging in the cloud with klipReduce and Findr
NASA Astrophysics Data System (ADS)
Haug-Baltzell, Asher; Males, Jared R.; Morzinski, Katie M.; Wu, Ya-Lin; Merchant, Nirav; Lyons, Eric; Close, Laird M.
2016-08-01
Astronomical data sets are growing ever larger, and the area of high contrast imaging of exoplanets is no exception. With the advent of fast, low-noise detectors operating at 10 to 1000 Hz, huge numbers of images can be taken during a single hours-long observation. High frame rates offer several advantages, such as improved registration, frame selection, and improved speckle calibration. However, advanced image processing algorithms are computationally challenging to apply. Here we describe a parallelized, cloud-based data reduction system developed for the Magellan Adaptive Optics VisAO camera, which is capable of rapidly exploring tens of thousands of parameter sets affecting the Karhunen-Loève image processing (KLIP) algorithm to produce high-quality direct images of exoplanets. We demonstrate these capabilities with a visible wavelength high contrast data set of a hydrogen-accreting brown dwarf companion.
Willardson, Jeffrey M; Bressel, Eadric
2004-08-01
The purpose of this research was to devise prediction equations whereby a 10 repetition maximum (10RM) for the free weight parallel squat could be predicted using the following predictor variables: 10RM for the 45 degrees angled leg press, body mass, and limb length. Sixty men were tested over a 3-week period, with 1 testing session each week. During each testing session, subjects performed a 10RM for the free weight parallel squat and 45 degrees angled leg press. Stepwise multiple regression analysis showed leg press mass lifted to be a significant predictor of squat mass lifted for both the advanced and the novice groups (p < 0.05). Leg press mass lifted accounted for approximately 25% of the variance in squat mass lifted for the novice group and 55% of the variance in squat mass lifted for the advanced group. Limb length and body mass were not significant predictors of squat mass lifted for either group. The following prediction equations were devised: (a) novice group squat mass = leg press mass (0.210) + 36.244 kg, (b) advanced group squat mass = leg press mass (0.310) + 19.438 kg, and (c) subject pool squat mass = leg press mass (0.354) + 2.235 kg. These prediction equations may save time and reduce the risk of injury when switching from the leg press to the squat exercise.
Collaborative visual analytics of radio surveys in the Big Data era
NASA Astrophysics Data System (ADS)
Vohl, Dany; Fluke, Christopher J.; Hassan, Amr H.; Barnes, David G.; Kilborn, Virginia A.
2017-06-01
Radio survey datasets comprise an increasing number of individual observations stored as sets of multidimensional data. In large survey projects, astronomers commonly face limitations regarding: 1) interactive visual analytics of sufficiently large subsets of data; 2) synchronous and asynchronous collaboration; and 3) documentation of the discovery workflow. To support collaborative data inquiry, we present encube, a large-scale comparative visual analytics framework. encube can utilise advanced visualization environments such as the CAVE2 (a hybrid 2D and 3D virtual reality environment powered with a 100 Tflop/s GPU-based supercomputer and 84 million pixels) for collaborative analysis of large subsets of data from radio surveys. It can also run on standard desktops, providing a capable visual analytics experience across the display ecology. encube is composed of four primary units enabling compute-intensive processing, advanced visualisation, dynamic interaction, parallel data query, along with data management. Its modularity will make it simple to incorporate astronomical analysis packages and Virtual Observatory capabilities developed within our community. We discuss how encube builds a bridge between high-end display systems (such as CAVE2) and the classical desktop, preserving all traces of the work completed on either platform - allowing the research process to continue wherever you are.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Zhou, Xian; Chen, Xue
2011-05-09
The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112 Gbit/s POLMUX-DQPSK signal at the hundreds MHz range. © 2011 Optical Society of America
Spatially parallel processing of within-dimension conjunctions.
Linnell, K J; Humphreys, G W
2001-01-01
Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Adaptive MCMC in Bayesian phylogenetics: an application to analyzing partitioned data in BEAST.
Baele, Guy; Lemey, Philippe; Rambaut, Andrew; Suchard, Marc A
2017-06-15
Advances in sequencing technology continue to deliver increasingly large molecular sequence datasets that are often heavily partitioned in order to accurately model the underlying evolutionary processes. In phylogenetic analyses, partitioning strategies involve estimating conditionally independent models of molecular evolution for different genes and different positions within those genes, requiring a large number of evolutionary parameters that have to be estimated, leading to an increased computational burden for such analyses. The past two decades have also seen the rise of multi-core processors, both in the central processing unit (CPU) and Graphics processing unit processor markets, enabling massively parallel computations that are not yet fully exploited by many software packages for multipartite analyses. We here propose a Markov chain Monte Carlo (MCMC) approach using an adaptive multivariate transition kernel to estimate in parallel a large number of parameters, split across partitioned data, by exploiting multi-core processing. Across several real-world examples, we demonstrate that our approach enables the estimation of these multipartite parameters more efficiently than standard approaches that typically use a mixture of univariate transition kernels. In one case, when estimating the relative rate parameter of the non-coding partition in a heterochronous dataset, MCMC integration efficiency improves by > 14-fold. Our implementation is part of the BEAST code base, a widely used open source software package to perform Bayesian phylogenetic inference. guy.baele@kuleuven.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Designing for Peta-Scale in the LSST Database
NASA Astrophysics Data System (ADS)
Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.
2007-10-01
The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.
Remote Internet access to advanced analytical facilities: a new approach with Web-based services.
Sherry, N; Qin, J; Fuller, M Suominen; Xie, Y; Mola, O; Bauer, M; McIntyre, N S; Maxwell, D; Liu, D; Matias, E; Armstrong, C
2012-09-04
Over the past decade, the increasing availability of the World Wide Web has held out the possibility that the efficiency of scientific measurements could be enhanced in cases where experiments were being conducted at distant facilities. Examples of early successes have included X-ray diffraction (XRD) experimental measurements of protein crystal structures at synchrotrons and access to scanning electron microscopy (SEM) and NMR facilities by users from institutions that do not possess such advanced capabilities. Experimental control, visual contact, and receipt of results has used some form of X forwarding and/or VNC (virtual network computing) software that transfers the screen image of a server at the experimental site to that of the users' home site. A more recent development is a web services platform called Science Studio that provides teams of scientists with secure links to experiments at one or more advanced research facilities. The software provides a widely distributed team with a set of controls and screens to operate, observe, and record essential parts of the experiment. As well, Science Studio provides high speed network access to computing resources to process the large data sets that are often involved in complex experiments. The simple web browser and the rapid transfer of experimental data to a processing site allow efficient use of the facility and assist decision making during the acquisition of the experimental results. The software provides users with a comprehensive overview and record of all parts of the experimental process. A prototype network is described involving X-ray beamlines at two different synchrotrons and an SEM facility. An online parallel processing facility has been developed that analyzes the data in near-real time using stream processing. Science Studio and can be expanded to include many other analytical applications, providing teams of users with rapid access to processed results along with the means for detailed discussion of their significance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
A review on battery thermal management in electric vehicle application
NASA Astrophysics Data System (ADS)
Xia, Guodong; Cao, Lei; Bi, Guanglong
2017-11-01
The global issues of energy crisis and air pollution have offered a great opportunity to develop electric vehicles. However, so far, cycle life of power battery, environment adaptability, driving range and charging time seems far to compare with the level of traditional vehicles with internal combustion engine. Effective battery thermal management (BTM) is absolutely essential to relieve this situation. This paper reviews the existing literature from two levels that are cell level and battery module level. For single battery, specific attention is paid to three important processes which are heat generation, heat transport, and heat dissipation. For large format cell, multi-scale multi-dimensional coupled models have been developed. This will facilitate the investigation on factors, such as local irreversible heat generation, thermal resistance, current distribution, etc., that account for intrinsic temperature gradients existing in cell. For battery module based on air and liquid cooling, series, series-parallel and parallel cooling configurations are discussed. Liquid cooling strategies, especially direct liquid cooling strategies, are reviewed and they may advance the battery thermal management system to a new generation.
NASA Astrophysics Data System (ADS)
Machalek, P.; Kim, S. M.; Berry, R. D.; Liang, A.; Small, T.; Brevdo, E.; Kuznetsova, A.
2012-12-01
We describe how the Climate Corporation uses Python and Clojure, a language impleneted on top of Java, to generate climatological forecasts for precipitation based on the Advanced Hydrologic Prediction Service (AHPS) radar based daily precipitation measurements. A 2-year-long forecasts is generated on each of the ~650,000 CONUS land based 4-km AHPS grids by constructing 10,000 ensembles sampled from a 30-year reconstructed AHPS history for each grid. The spatial and temporal correlations between neighboring AHPS grids and the sampling of the analogues are handled by Python. The parallelization for all the 650,000 CONUS stations is further achieved by utilizing the MAP-REDUCE framework (http://code.google.com/edu/parallel/mapreduce-tutorial.html). Each full scale computational run requires hundreds of nodes with up to 8 processors each on the Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/) distributed computing service resulting in 3 terabyte datasets. We further describe how we have productionalized a monthly run of the simulations process at full scale of the 4km AHPS grids and how the resultant terabyte sized datasets are handled.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria
Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
Computer vision for driver assistance systems
NASA Astrophysics Data System (ADS)
Handmann, Uwe; Kalinke, Thomas; Tzomakas, Christos; Werner, Martin; von Seelen, Werner
1998-07-01
Systems for automated image analysis are useful for a variety of tasks and their importance is still increasing due to technological advances and an increase of social acceptance. Especially in the field of driver assistance systems the progress in science has reached a level of high performance. Fully or partly autonomously guided vehicles, particularly for road-based traffic, pose high demands on the development of reliable algorithms due to the conditions imposed by natural environments. At the Institut fur Neuroinformatik, methods for analyzing driving relevant scenes by computer vision are developed in cooperation with several partners from the automobile industry. We introduce a system which extracts the important information from an image taken by a CCD camera installed at the rear view mirror in a car. The approach consists of a sequential and a parallel sensor and information processing. Three main tasks namely the initial segmentation (object detection), the object tracking and the object classification are realized by integration in the sequential branch and by fusion in the parallel branch. The main gain of this approach is given by the integrative coupling of different algorithms providing partly redundant information.
Panoramic attitude sensor for Radio Astronomy Explorer B
NASA Technical Reports Server (NTRS)
Thomsen, R.
1973-01-01
An instrument system to acquire attitude determination data for the RAE-B spacecraft was designed and built. The system consists of an electronics module and two optical scanner heads. Each scanner head has an optical scanner with a field of view of 0.7 degrees diameter which scans the sky and measures the position of the moon, earth and sun relative to the spacecraft. This scanning is accomplished in either of two modes. When the spacecraft is spinning, the scanner operates in spherical mode, with the spacecraft spin providing the slow sweep of lattitude to scan the entire sky. After the spacecraft is placed in lunar orbit and despun, the scanner will operate in planar mode, advancing at a rate of 5.12 seconds per revolution in a fixed plane parallel to the spacecraft Z axis. This scan will cross and measure the moon horizons with every revolution. Each scanner head also has a sun slit which is aligned parallel to the spin axis of the spacecraft and which provides a sun pulse each revolution of the spacecraft. The electronics module provides the command and control, data processing and housekeeping functions.
NASA Astrophysics Data System (ADS)
Li, Gaohua; Fu, Xiang; Wang, Fuxin
2017-10-01
The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
NASA Technical Reports Server (NTRS)
Bekey, I.; Mayer, H. L.; Wolfe, M. G.
1976-01-01
The likely system concepts which might be representative of NASA and DoD space programs in the 1980-2000 time period were studied along with the programs' likely needs for major space transportation vehicles, orbital support vehicles, and technology developments which could be shared by the military and civilian space establishments in that time period. Such needs could then be used by NASA as an input in determining the nature of its long-range development plan. The approach used was to develop a list of possible space system concepts (initiatives) in parallel with a list of needs based on consideration of the likely environments and goals of the future. The two lists thus obtained represented what could be done, regardless of need; and what should be done, regardless of capability, respectively. A set of development program plans for space application concepts was then assembled, matching needs against capabilities, and the requirements of the space concepts for support vehicles, transportation, and technology were extracted. The process was pursued in parallel for likely military and civilian programs, and the common support needs thus identified.
Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU.
Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan
2013-01-01
This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis.
NASA Technical Reports Server (NTRS)
Raju, Manthena S.
1998-01-01
Sprays occur in a wide variety of industrial and power applications and in the processing of materials. A liquid spray is a phase flow with a gas as the continuous phase and a liquid as the dispersed phase (in the form of droplets or ligaments). Interactions between the two phases, which are coupled through exchanges of mass, momentum, and energy, can occur in different ways at different times and locations involving various thermal, mass, and fluid dynamic factors. An understanding of the flow, combustion, and thermal properties of a rapidly vaporizing spray requires careful modeling of the rate-controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as other phenomena. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on sprays to unstructured grids and parallel computing. LSPRAY, which was developed by M.S. Raju of Nyma, Inc., is designed to be massively parallel and could easily be coupled with any existing gas-phase flow and/or Monte Carlo probability density function (PDF) solver. The LSPRAY solver accommodates the use of an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements in the gas-phase solvers. It is used specifically for fuel sprays within gas turbine combustors, but it has many other uses. The spray model used in LSPRAY provided favorable results when applied to stratified-charge rotary combustion (Wankel) engines and several other confined and unconfined spray flames. The source code will be available with the National Combustion Code (NCC) as a complete package.
Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU
Wang, Wei-Jen; Hsieh, I-Fan; Chen, Chun-Chuan
2013-01-01
This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis. PMID:23840507
[CMACPAR an modified parallel neuro-controller for control processes].
Ramos, E; Surós, R
1999-01-01
CMACPAR is a Parallel Neurocontroller oriented to real time systems as for example Control Processes. Its characteristics are mainly a fast learning algorithm, a reduced number of calculations, great generalization capacity, local learning and intrinsic parallelism. This type of neurocontroller is used in real time applications required by refineries, hydroelectric centers, factories, etc. In this work we present the analysis and the parallel implementation of a modified scheme of the Cerebellar Model CMAC for the n-dimensional space projection using a mean granularity parallel neurocontroller. The proposed memory management allows for a significant memory reduction in training time and required memory size.
The history of MR imaging as seen through the pages of radiology.
Edelman, Robert R
2014-11-01
The first reports in Radiology pertaining to magnetic resonance (MR) imaging were published in 1980, 7 years after Paul Lauterbur pioneered the first MR images and 9 years after the first human computed tomographic images were obtained. Historical advances in the research and clinical applications of MR imaging very much parallel the remarkable advances in MR imaging technology. These advances can be roughly classified into hardware (eg, magnets, gradients, radiofrequency [RF] coils, RF transmitter and receiver, MR imaging-compatible biopsy devices) and imaging techniques (eg, pulse sequences, parallel imaging, and so forth). Image quality has been dramatically improved with the introduction of high-field-strength superconducting magnets, digital RF systems, and phased-array coils. Hybrid systems, such as MR/positron emission tomography (PET), combine the superb anatomic and functional imaging capabilities of MR imaging with the unsurpassed capability of PET to demonstrate tissue metabolism. Supported by the improvements in hardware, advances in pulse sequence design and image reconstruction techniques have spurred dramatic improvements in imaging speed and the capability for studying tissue function. In this historical review, the history of MR imaging technology and developing research and clinical applications, as seen through the pages of Radiology, will be considered.
Parallel-Processing Test Bed For Simulation Software
NASA Technical Reports Server (NTRS)
Blech, Richard; Cole, Gary; Townsend, Scott
1996-01-01
Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Time Series Discord Detection in Medical Data using a Parallel Relational Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodbridge, Diane; Rintoul, Mark Daniel; Wilson, Andrew T.
Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithmsmore » on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.« less
Time Series Discord Detection in Medical Data using a Parallel Relational Database [PowerPoint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodbridge, Diane; Wilson, Andrew T.; Rintoul, Mark Daniel
Recent advances in sensor technology have made continuous real-time health monitoring available in both hospital and non-hospital settings. Since data collected from high frequency medical sensors includes a huge amount of data, storing and processing continuous medical data is an emerging big data area. Especially detecting anomaly in real time is important for patients’ emergency detection and prevention. A time series discord indicates a subsequence that has the maximum difference to the rest of the time series subsequences, meaning that it has abnormal or unusual data trends. In this study, we implemented two versions of time series discord detection algorithmsmore » on a high performance parallel database management system (DBMS) and applied them to 240 Hz waveform data collected from 9,723 patients. The initial brute force version of the discord detection algorithm takes each possible subsequence and calculates a distance to the nearest non-self match to find the biggest discords in time series. For the heuristic version of the algorithm, a combination of an array and a trie structure was applied to order time series data for enhancing time efficiency. The study results showed efficient data loading, decoding and discord searches in a large amount of data, benefiting from the time series discord detection algorithm and the architectural characteristics of the parallel DBMS including data compression, data pipe-lining, and task scheduling.« less
Xia, Yidong; Lou, Jialin; Luo, Hong; ...
2015-02-09
Here, an OpenACC directive-based graphics processing unit (GPU) parallel scheme is presented for solving the compressible Navier–Stokes equations on 3D hybrid unstructured grids with a third-order reconstructed discontinuous Galerkin method. The developed scheme requires the minimum code intrusion and algorithm alteration for upgrading a legacy solver with the GPU computing capability at very little extra effort in programming, which leads to a unified and portable code development strategy. A face coloring algorithm is adopted to eliminate the memory contention because of the threading of internal and boundary face integrals. A number of flow problems are presented to verify the implementationmore » of the developed scheme. Timing measurements were obtained by running the resulting GPU code on one Nvidia Tesla K20c GPU card (Nvidia Corporation, Santa Clara, CA, USA) and compared with those obtained by running the equivalent Message Passing Interface (MPI) parallel CPU code on a compute node (consisting of two AMD Opteron 6128 eight-core CPUs (Advanced Micro Devices, Inc., Sunnyvale, CA, USA)). Speedup factors of up to 24× and 1.6× for the GPU code were achieved with respect to one and 16 CPU cores, respectively. The numerical results indicate that this OpenACC-based parallel scheme is an effective and extensible approach to port unstructured high-order CFD solvers to GPU computing.« less
Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas
2017-10-09
Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Parallelizing quantum circuit synthesis
NASA Astrophysics Data System (ADS)
Di Matteo, Olivia; Mosca, Michele
2016-03-01
Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.
ERIC Educational Resources Information Center
Miller, Jeff; Ulrich, Rolf; Rolke, Bettina
2009-01-01
Within the context of the psychological refractory period (PRP) paradigm, we developed a general theoretical framework for deciding when it is more efficient to process two tasks in serial and when it is more efficient to process them in parallel. This analysis suggests that a serial mode is more efficient than a parallel mode under a wide variety…
The role of parallelism in the real-time processing of anaphora.
Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P
2012-06-01
Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.
The role of parallelism in the real-time processing of anaphora
Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P.
2012-01-01
Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution. PMID:23741080
NASA Astrophysics Data System (ADS)
Bavdaz, Marcos; Wille, Eric; Shortt, Brian; Fransen, Sebastiaan; Collon, Maximilien; Vacanti, Giuseppe; Günther, Ramses; Yanson, Alexei; Vervest, Mark; Haneveld, Jeroen; van Baren, Coen; Zuknik, Karl-Heinz; Christensen, Finn; Krumrey, Michael; Burwitz, Vadim; Pareschi, Giovanni; Valsecchi, Giuseppe
2015-09-01
The Advanced Telescope for High ENergy Astrophysics (Athena) was selected in 2014 as the second large class mission (L2) of the ESA Cosmic Vision Science Programme within the Directorate of Science and Robotic Exploration. The mission development is proceeding via the implementation of the system studies and in parallel a comprehensive series of technology preparation activities. [1-3]. The core enabling technology for the high performance mirror is the Silicon Pore Optics (SPO), a modular X-ray optics technology, which utilises processes and equipment developed for the semiconductor industry [4-31]. This paper provides an overview of the programmatic background, the status of SPO technology and give an outline of the development roadmap and activities undertaken and planned by ESA.
The Development of Lightweight Electronics Enclosures for Space Applications
NASA Technical Reports Server (NTRS)
Fenske, Matthew T.; Barth, Jane L.; Didion, Jeffrey R.; Mule, Peter
1999-01-01
This paper outlines the end to end effort to produce lightweight electronics enclosures for NASA GSFC electronics applications with the end goal of presenting an array of lightweight box options for a flight opportunity. Topics including the development of requirements, design of three different boxes, utilization of advanced materials and processes, and analysis and test will be discussed. Three different boxes were developed independently and in parallel. A lightweight machined Aluminum box, a cast Aluminum box and a composite box were designed, fabricated, and tested both mechanically and thermally. There were many challenges encountered in meeting the requirements with a non-metallic enclosure and the development of the composite box employed several innovative techniques.
Peak-picking fundamental period estimation for hearing prostheses.
Howard, D M
1989-09-01
A real-time peak-picking fundamental period estimation device is described which is used in advanced hearing prostheses for the totally and profoundly deafened. The operation of the peak picker is compared with three well-established fundamental frequency estimation techniques: the electrolaryngograph, which is used as a "standard" hardware implementations of the cepstral technique, and the Gold/Rabiner parallel processing algorithm. These comparisons illustrate and highlight some of the important advantages and disadvantages that characterize the operation of these techniques. The special requirements of the hearing prostheses are discussed with respect to the operation of each device, and the choice of the peak picker is found to be felicitous in this application.
Emerging patterns of somatic mutations in cancer
Watson, Ian R.; Takahashi, Koichi; Futreal, P. Andrew; Chin, Lynda
2014-01-01
The advance in technological tools for massively parallel, high-throughput sequencing of DNA has enabled the comprehensive characterization of somatic mutations in large number of tumor samples. Here, we review recent cancer genomic studies that have assembled emerging views of the landscapes of somatic mutations through deep sequencing analyses of the coding exomes and whole genomes in various cancer types. We discuss the comparative genomics of different cancers, including mutation rates, spectrums, and roles of environmental insults that influence these processes. We highlight the developing statistical approaches used to identify significantly mutated genes, and discuss the emerging biological and clinical insights from such analyses as well as the challenges ahead translating these genomic data into clinical impacts. PMID:24022702
Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I
2015-01-01
This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
How I manage patients with Fanconi anaemia.
Dufour, Carlo
2017-07-01
Fanconi Anaemia is a rare, genetic heterogeneous multisystem disease that is the most common congenital syndrome of marrow failure. Twenty genes have been reported to cause the disease. Remarkable progress has been made over the last 20 years in the understanding of the genetic and pathophysiological mechanisms. Unfortunately, these advances have not been completely paralleled by advances in medical treatment, where the most important component remains stem cell transplantation. This therapy, although contributing to long-term negative effects, such as increased occurrence of late malignancies, is the only current option capable of prolonging the survival of patients. In spite of relevant recent progress in matched unrelated donor transplants, the largest studies with longer follow-up still show a superiority of matched sibling donor transplants with a success rate, in selected cohorts, of over 90%. This article reviews different aspects of the disease, including genetics, diagnosis and treatment options, with special focus on stem cell transplantation, comprehensive post-diagnosis management, decision-making processes and long-term follow-up. © 2017 John Wiley & Sons Ltd.
Advanced physical fine coal cleaning: Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1987-12-01
The contract objective was to demonstrate Advanced Energy Dynamics, Inc., (AED) Ultrafine Coal (UFC) electrostatic physical fine coal cleaning process as capable of: producing clean coal products of no greater than 2% ash; significantly reducing the pyritic sulfur content below that achievable with state-of-the-art coal cleaning; recovering over 80% of the available energy content in the run-of-mine coal; producing product and refuse with surface moisture below 30%. Originally the demonstration was to be of a Charger/Disc System at the Electric Power Research Institute (EPRI) Coal Quality Development Center (CQDC) at Homer City, Pennsylvania. As a result of the combination ofmore » Charger/Disc System scale-up problems and parallel development of an improved Vertical-Belt Separator, DOE issued a contract modification to perform additional laboratory testing and optimization of the UFC Vertical-Belt Separator System at AED. These comparative test results, safety analyses and an economic analysis are discussed in this report. 29 refs., 25 figs., 41 tabs.« less
A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi
1997-01-01
A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
Recent Advances in Photonic Devices for Optical Computing and the Role of Nonlinear Optics-Part II
NASA Technical Reports Server (NTRS)
Abdeldayem, Hossin; Frazier, Donald O.; Witherow, William K.; Banks, Curtis E.; Paley, Mark S.
2007-01-01
The twentieth century has been the era of semiconductor materials and electronic technology while this millennium is expected to be the age of photonic materials and all-optical technology. Optical technology has led to countless optical devices that have become indispensable in our daily lives in storage area networks, parallel processing, optical switches, all-optical data networks, holographic storage devices, and biometric devices at airports. This chapters intends to bring some awareness to the state-of-the-art of optical technologies, which have potential for optical computing and demonstrate the role of nonlinear optics in many of these components. Our intent, in this Chapter, is to present an overview of the current status of optical computing, and a brief evaluation of the recent advances and performance of the following key components necessary to build an optical computing system: all-optical logic gates, adders, optical processors, optical storage, holographic storage, optical interconnects, spatial light modulators and optical materials.
Algorithms and programming tools for image processing on the MPP
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1985-01-01
Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems
NASA Astrophysics Data System (ADS)
Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.
2017-07-01
Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2012-07-01
Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
Parallel adaptive wavelet collocation method for PDEs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
Adapting high-level language programs for parallel processing using data flow
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1988-01-01
EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Paramedir: A Tool for Programmable Performance Analysis
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
Performance analysis of parallel scientific applications is time consuming and requires great expertise in areas such as programming paradigms, system software, and computer hardware architectures. In this paper we describe a tool that facilitates the programmability of performance metric calculations thereby allowing the automation of the analysis and reducing the application development time. We demonstrate how the system can be used to capture knowledge and intuition acquired by advanced parallel programmers in order to be transferred to novice users.
Impact of new computing systems on computational mechanics and flight-vehicle structures technology
NASA Technical Reports Server (NTRS)
Noor, A. K.; Storaasli, O. O.; Fulton, R. E.
1984-01-01
Advances in computer technology which may have an impact on computational mechanics and flight vehicle structures technology were reviewed. The characteristics of supersystems, highly parallel systems, and small systems are summarized. The interrelations of numerical algorithms and software with parallel architectures are discussed. A scenario for future hardware/software environment and engineering analysis systems is presented. Research areas with potential for improving the effectiveness of analysis methods in the new environment are identified.
Lee, Kenneth K C; Mariampillai, Adrian; Yu, Joe X Z; Cadotte, David W; Wilson, Brian C; Standish, Beau A; Yang, Victor X D
2012-07-01
Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second.
NASA Astrophysics Data System (ADS)
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Applying Parallel Processing Techniques to Tether Dynamics Simulation
NASA Technical Reports Server (NTRS)
Wells, B. Earl
1996-01-01
The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.
EPE analysis of sub-N10 BEoL flow with and without fully self-aligned via using Coventor SEMulator3D
NASA Astrophysics Data System (ADS)
Franke, Joern-Holger; Gallagher, Matt; Murdoch, Gayle; Halder, Sandip; Juncker, Aurelie; Clark, William
2017-03-01
During the last few decades, the semiconductor industry has been able to scale device performance up while driving costs down. What started off as simple geometrical scaling, driven mostly by advances in lithography, has recently been accompanied by advances in processing techniques and in device architectures. The trend to combine efforts using process technology and lithography is expected to intensify, as further scaling becomes ever more difficult. One promising component of future nodes are "scaling boosters", i.e. processing techniques that enable further scaling. An indispensable component in developing these ever more complex processing techniques is semiconductor process modeling software. Visualization of complex 3D structures in SEMulator3D, along with budget analysis on film thicknesses, CD and etch budgets, allow process integrators to compare flows before any physical wafers are run. Hundreds of "virtual" wafers allow comparison of different processing approaches, along with EUV or DUV patterning options for defined layers and different overlay schemes. This "virtual fabrication" technology produces massively parallel process variation studies that would be highly time-consuming or expensive in experiment. Here, we focus on one particular scaling booster, the fully self-aligned via (FSAV). We compare metal-via-metal (mevia-me) chains with self-aligned and fully-self-aligned via's using a calibrated model for imec's N7 BEoL flow. To model overall variability, 3D Monte Carlo modeling of as many variability sources as possible is critical. We use Coventor SEMulator3D to extract minimum me-me distances and contact areas and show how fully self-aligned vias allow a better me-via distance control and tighter via-me contact area variability compared with the standard self-aligned via (SAV) approach.
Parallel and serial grouping of image elements in visual perception.
Houtkamp, Roos; Roelfsema, Pieter R
2010-12-01
The visual system groups image elements that belong to an object and segregates them from other objects and the background. Important cues for this grouping process are the Gestalt criteria, and most theories propose that these are applied in parallel across the visual scene. Here, we find that Gestalt grouping can indeed occur in parallel in some situations, but we demonstrate that there are also situations where Gestalt grouping becomes serial. We observe substantial time delays when image elements have to be grouped indirectly through a chain of local groupings. We call this chaining process incremental grouping and demonstrate that it can occur for only a single object at a time. We suggest that incremental grouping requires the gradual spread of object-based attention so that eventually all the object's parts become grouped explicitly by an attentional labeling process. Our findings inspire a new incremental grouping theory that relates the parallel, local grouping process to feedforward processing and the serial, incremental grouping process to recurrent processing in the visual cortex.
Aging and feature search: the effect of search area.
Burton-Danner, K; Owsley, C; Jackson, G R
2001-01-01
The preattentive system involves the rapid parallel processing of visual information in the visual scene so that attention can be directed to meaningful objects and locations in the environment. This study used the feature search methodology to examine whether there are aging-related deficits in parallel-processing capabilities when older adults are required to visually search a large area of the visual field. Like young subjects, older subjects displayed flat, near-zero slopes for the Reaction Time x Set Size function when searching over a broad area (30 degrees radius) of the visual field, implying parallel processing of the visual display. These same older subjects exhibited impairment in another task, also dependent on parallel processing, performed over the same broad field area; this task, called the useful field of view test, has more complex task demands. Results imply that aging-related breakdowns of parallel processing over a large visual field area are not likely to emerge when required responses are simple, there is only one task to perform, and there is no limitation on visual inspection time.
Interdisciplinary Research and Phenomenology as Parallel Processes of Consciousness
ERIC Educational Resources Information Center
Arvidson, P. Sven
2016-01-01
There are significant parallels between interdisciplinarity and phenomenology. Interdisciplinary conscious processes involve identifying relevant disciplines, evaluating each disciplinary insight, and creating common ground. In an analogous way, phenomenology involves conscious processes of epoché, reduction, and eidetic variation. Each stresses…
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Advanced missions safety. Volume 3: Appendices. Part 1: Space shuttle rescue capability
NASA Technical Reports Server (NTRS)
1972-01-01
The space shuttle rescue capability is analyzed as a part of the advanced mission safety study. The subjects discussed are: (1) mission evaluation, (2) shuttle configurations and performance, (3) performance of shuttle-launched tug system, (4) multiple pass grazing reentry from lunar orbit, (5) ground launched ascent and rendezvous time, (6) cost estimates, and (7) parallel-burn space shuttle configuration.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Parallelized reliability estimation of reconfigurable computer networks
NASA Technical Reports Server (NTRS)
Nicol, David M.; Das, Subhendu; Palumbo, Dan
1990-01-01
A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.
NASA Astrophysics Data System (ADS)
Gutsch, Manuela; Choi, Kang-Hoon; Hanisch, Norbert; Hohle, Christoph; Seidel, Robert; Steidel, Katja; Thrun, Xaver; Werner, Thomas
2014-10-01
Many efforts were spent in the development of EUV technologies, but from a customer point of view EUV is still behind expectations. In parallel since years maskless lithography is included in the ITRS roadmap wherein multi electron beam direct patterning is considered as an alternative or complementary approach for patterning of advanced technology nodes. The process of multi beam exposures can be emulated by single beam technologies available in the field. While variable shape-beam direct writers are already used for niche applications, the integration capability of e-beam direct write at advanced nodes has not been proven, yet. In this study the e-beam lithography was implemented in the BEoL processes of the 28nm SRAM technology. Integrated 300mm wafers with a 28nm back-end of line (BEoL) stack from GLOBALFOUNDRIES, Dresden, were used for the experiments. For the patterning of the Metal layer a Mix and Match concept based on the sequence litho - etch - litho - etch (LELE) was developed and evaluated wherein several exposure fields were blanked out during the optical exposure. E-beam patterning results of BEoL Metal and Via layers are presented using a 50kV VISTEC SB3050DW variable shaped electron beam direct writer at Fraunhofer IPMS-CNT. Etch results are shown and compared to the POR. In summary we demonstrate the integration capability of EBDW into a productive CMOS process flow at the example of the 28nm SRAM technology node.
Balabanič, Damjan; Hermosilla, Daphne; Merayo, Noemí; Klemenčič, Aleksandra Krivograd; Blanco, Angeles
2012-01-01
There is increasing concern about chemical pollutants that have the ability to mimic hormones, the so-called endocrine-disrupting compounds (EDCs). One of the main reasons for concern is the possible effect of EDCs on human health. EDCs may be released into the environment in different ways, and one of the most significant sources is industrial wastewater. The main objective of this research was to evaluate the treatment performance of different wastewater treatment procedures (biological treatment, filtration, advanced oxidation processes) for the reduction of chemical oxygen demand and seven selected EDCs (dimethyl phthalate, diethyl phthalate, dibutyl phthalate, benzyl butyl phthalate, bis(2-ethylhexyl) phthalate, bisphenol A and nonylphenol) from wastewaters from a mill producing 100 % recycled paper. Two pilot plants were running in parallel and the following treatments were compared: (i) anaerobic biological treatment followed by aerobic biological treatment, ultrafiltration and reverse osmosis (RO), and (ii) anaerobic biological treatment followed by membrane bioreactor and RO. Moreover, at lab-scale, four different advanced oxidation processes (Fenton reaction, photo-Fenton reaction, photocatalysis with TiO(2), and ozonation) were applied. The results indicated that the concentrations of selected EDCs from paper mill wastewaters were effectively reduced (100 %) by both combinations of pilot plants and photo-Fenton oxidation (98 %), while Fenton process, photocatalysis with TiO(2) and ozonation were less effective (70 % to 90 %, respectively).
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; ...
2015-07-15
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
Parallel integrated frame synchronizer chip
NASA Technical Reports Server (NTRS)
Solomon, Jeffrey Michael (Inventor); Ghuman, Parminder Singh (Inventor); Bennett, Toby Dennis (Inventor)
2000-01-01
A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.
Free-energy landscape of protein oligomerization from atomistic simulations
Barducci, Alessandro; Bonomi, Massimiliano; Prakash, Meher K.; Parrinello, Michele
2013-01-01
In the realm of protein–protein interactions, the assembly process of homooligomers plays a fundamental role because the majority of proteins fall into this category. A comprehensive understanding of this multistep process requires the characterization of the driving molecular interactions and the transient intermediate species. The latter are often short-lived and thus remain elusive to most experimental investigations. Molecular simulations provide a unique tool to shed light onto these complex processes complementing experimental data. Here we combine advanced sampling techniques, such as metadynamics and parallel tempering, to characterize the oligomerization landscape of fibritin foldon domain. This system is an evolutionarily optimized trimerization motif that represents an ideal model for experimental and computational mechanistic studies. Our results are fully consistent with previous experimental nuclear magnetic resonance and kinetic data, but they provide a unique insight into fibritin foldon assembly. In particular, our simulations unveil the role of nonspecific interactions and suggest that an interplay between thermodynamic bias toward native structure and residual conformational disorder may provide a kinetic advantage. PMID:24248370
[QUIPS: quality improvement in postoperative pain management].
Meissner, Winfried
2011-01-01
Despite the availability of high-quality guidelines and advanced pain management techniques acute postoperative pain management is still far from being satisfactory. The QUIPS (Quality Improvement in Postoperative Pain Management) project aims to improve treatment quality by means of standardised data acquisition, analysis of quality and process indicators, and feedback and benchmarking. During a pilot phase funded by the German Ministry of Health (BMG), a total of 12,389 data sets were collected from six participating hospitals. Outcome improved in four of the six hospitals. Process indicators, such as routine pain documentation, were only poorly correlated with outcomes. To date, more than 130 German hospitals use QUIPS as a routine quality management tool. An EC-funded parallel project disseminates the concept internationally. QUIPS demonstrates that patient-reported outcomes in postoperative pain management can be benchmarked in routine clinical practice. Quality improvement initiatives should use outcome instead of structural and process parameters. The concept is transferable to other fields of medicine. Copyright © 2011. Published by Elsevier GmbH.
Optimization of the oxidant supply system for combined cycle MHD power plants
NASA Technical Reports Server (NTRS)
Juhasz, A. J.
1982-01-01
An in-depth study was conducted to determine what, if any, improvements could be made on the oxidant supply system for combined cycle MHD power plants which could be reflected in higher thermal efficiency and a reduction in the cost of electricity, COE. A systematic analysis of air separation process varitions which showed that the specific energy consumption could be minimized when the product stream oxygen concentration is about 70 mole percent was conducted. The use of advanced air compressors, having variable speed and guide vane position control, results in additional power savings. The study also led to the conceptual design of a new air separation process, sized for a 500 MW sub e MHD plant, referred to a internal compression is discussed. In addition to its lower overall energy consumption, potential capital cost savings were identified for air separation plants using this process when constructed in a single large air separation train rather than multiple parallel trains, typical of conventional practice.
Bridging the Gaps: the Promise of Omics Studies in Pediatric Exercise Research
Radom-Aizik, Shlomit; Cooper, Dan M.
2018-01-01
In this review, we highlight promising new discoveries that may generate useful and clinically relevant insights into the mechanisms that link exercise with growth during critical periods of development. Growth in childhood and adolescence is unique among mammals, and is a dynamic process regulated by an evolution of hormonal and inflammatory mediators, age-dependent progression of gene expression, and environmentally modulated epigenetic mechanisms. Many of these same processes likely affect molecular transducers of physical activity. How the molecular signaling associated with growth is synchronized with signaling associated with exercise is poorly understood. Recent advances in “omics,” namely, genomics and epigenetics, metabolomics, and proteomics, now provide exciting approaches and tools that can be used for the first time to address this gap. A biologic definition of “healthy” exercise that links the metabolic transducers of physical activity with parallel processes that regulate growth will transform health policy and guidelines that promote optimal use of physical activity. PMID:27137166
Free-energy landscape of protein oligomerization from atomistic simulations.
Barducci, Alessandro; Bonomi, Massimiliano; Prakash, Meher K; Parrinello, Michele
2013-12-03
In the realm of protein-protein interactions, the assembly process of homooligomers plays a fundamental role because the majority of proteins fall into this category. A comprehensive understanding of this multistep process requires the characterization of the driving molecular interactions and the transient intermediate species. The latter are often short-lived and thus remain elusive to most experimental investigations. Molecular simulations provide a unique tool to shed light onto these complex processes complementing experimental data. Here we combine advanced sampling techniques, such as metadynamics and parallel tempering, to characterize the oligomerization landscape of fibritin foldon domain. This system is an evolutionarily optimized trimerization motif that represents an ideal model for experimental and computational mechanistic studies. Our results are fully consistent with previous experimental nuclear magnetic resonance and kinetic data, but they provide a unique insight into fibritin foldon assembly. In particular, our simulations unveil the role of nonspecific interactions and suggest that an interplay between thermodynamic bias toward native structure and residual conformational disorder may provide a kinetic advantage.
Longitudinal train dynamics: an overview
NASA Astrophysics Data System (ADS)
Wu, Qing; Spiryagin, Maksym; Cole, Colin
2016-12-01
This paper discusses the evolution of longitudinal train dynamics (LTD) simulations, which covers numerical solvers, vehicle connection systems, air brake systems, wagon dumper systems and locomotives, resistance forces and gravitational components, vehicle in-train instabilities, and computing schemes. A number of potential research topics are suggested, such as modelling of friction, polymer, and transition characteristics for vehicle connection simulations, studies of wagon dumping operations, proper modelling of vehicle in-train instabilities, and computing schemes for LTD simulations. Evidence shows that LTD simulations have evolved with computing capabilities. Currently, advanced component models that directly describe the working principles of the operation of air brake systems, vehicle connection systems, and traction systems are available. Parallel computing is a good solution to combine and simulate all these advanced models. Parallel computing can also be used to conduct three-dimensional long train dynamics simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crozier, Paul; Howard, Micah; Rider, William J.
The SPARC (Sandia Parallel Aerodynamics and Reentry Code) will provide nuclear weapon qualification evidence for the random vibration and thermal environments created by re-entry of a warhead into the earth’s atmosphere. SPARC incorporates the innovative approaches of ATDM projects on several fronts including: effective harnessing of heterogeneous compute nodes using Kokkos, exascale-ready parallel scalability through asynchronous multi-tasking, uncertainty quantification through Sacado integration, implementation of state-of-the-art reentry physics and multiscale models, use of advanced verification and validation methods, and enabling of improved workflows for users. SPARC is being developed primarily for the Department of Energy nuclear weapon program, with additional developmentmore » and use of the code is being supported by the Department of Defense for conventional weapons programs.« less
Design, fabrication and control of origami robots
NASA Astrophysics Data System (ADS)
Rus, Daniela; Tolley, Michael T.
2018-06-01
Origami robots are created using folding processes, which provide a simple approach to fabricating a wide range of robot morphologies. Inspired by biological systems, engineers have started to explore origami folding in combination with smart material actuators to enable intrinsic actuation as a means to decouple design from fabrication complexity. The built-in crease structure of origami bodies has the potential to yield compliance and exhibit many soft body properties. Conventional fabrication of robots is generally a bottom-up assembly process with multiple low-level steps for creating subsystems that include manual operations and often multiple iterations. By contrast, natural systems achieve elegant designs and complex functionalities using top-down parallel transformation approaches such as folding. Folding in nature creates a wide spectrum of complex morpho-functional structures such as proteins and intestines and enables the development of structures such as flowers, leaves and insect wings. Inspired by nature, engineers have started to explore folding powered by embedded smart material actuators to create origami robots. The design and fabrication of origami robots exploits top-down, parallel transformation approaches to achieve elegant designs and complex functionalities. In this Review, we first introduce the concept of origami robotics and then highlight advances in design principles, fabrication methods, actuation, smart materials and control algorithms. Applications of origami robots for a variety of devices are investigated, and future directions of the field are discussed, examining both challenges and opportunities.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Automation of servicibility of radio-relay station equipment
NASA Astrophysics Data System (ADS)
Uryev, A. G.; Mishkin, Y. I.; Itkis, G. Y.
1985-03-01
Automation of the serviceability of radio relay station equipment must ensure central gathering and primary processing of reliable instrument reading with subsequent display on the control panel, detection and recording of failures soon enough, advance enough warning based on analysis of detertioration symptoms, and correct remote measurement of equipment performance parameters. Such an inspection will minimize transmission losses while reducing nonproductive time and labor spent on documentation and measurement. A multichannel automated inspection system for this purpose should operate by a parallel rather than sequential procedure. Digital data processing is more expedient in this case than analog method and, therefore, analog to digital converters are required. Spepcial normal, above limit and below limit test signals provide means of self-inspection, to which must be added adequate interference immunization, stabilization, and standby power supply. Use of a microcomputer permits overall refinement and expansion of the inspection system while it minimizes though not completely eliminates dependence on subjective judgment.
Multigigabit optical transceivers for high-data rate military applications
NASA Astrophysics Data System (ADS)
Catanzaro, Brian E.; Kuznia, Charlie
2012-01-01
Avionics has experienced an ever increasing demand for processing power and communication bandwidth. Currently deployed avionics systems require gigabit communication using opto-electronic transceivers connected with parallel optical fiber. Ultra Communications has developed a series of transceiver solutions combining ASIC technology with flip-chip bonding and advanced opto-mechanical molded optics. Ultra Communications custom high speed ASIC chips are developed using an SoS (silicon on sapphire) process. These circuits are flip chip bonded with sources (VCSEL arrays) and detectors (PIN diodes) to create an Opto-Electronic Integrated Circuit (OEIC). These have been combined with micro-optics assemblies to create transceivers with interfaces to standard fiber array (MT) cabling technology. We present an overview of the demands for transceivers in military applications and how new generation transceivers leverage both previous generation military optical transceivers as well as commercial high performance computing optical transceivers.
Quasielastic neutron scattering in biology: Theory and applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vural, Derya; Univ. of Tennessee, Knoxville, TN; Hu, Xiaohu
Neutrons scatter quasielastically from stochastic, diffusive processes, such as overdamped vibrations, localized diffusion and transitions between energy minima. In biological systems, such as proteins and membranes, these relaxation processes are of considerable physical interest. We review here recent methodological advances and applications of quasielastic neutron scattering (QENS) in biology, concentrating on the role of molecular dynamics simulation in generating data with which neutron profiles can be unambiguously interpreted. We examine the use of massively-parallel computers in calculating scattering functions, and the application of Markov state modeling. The decomposition of MD-derived neutron dynamic susceptibilities is described, and the use of thismore » in combination with NMR spectroscopy. We discuss dynamics at very long times, including approximations to the infinite time mean-square displacement and nonequilibrium aspects of single-protein dynamics. Lastly, we examine how neutron scattering and MD can be combined to provide information on lipid nanodomains.« less
Quasielastic neutron scattering in biology: Theory and applications
Vural, Derya; Univ. of Tennessee, Knoxville, TN; Hu, Xiaohu; ...
2016-06-15
Neutrons scatter quasielastically from stochastic, diffusive processes, such as overdamped vibrations, localized diffusion and transitions between energy minima. In biological systems, such as proteins and membranes, these relaxation processes are of considerable physical interest. We review here recent methodological advances and applications of quasielastic neutron scattering (QENS) in biology, concentrating on the role of molecular dynamics simulation in generating data with which neutron profiles can be unambiguously interpreted. We examine the use of massively-parallel computers in calculating scattering functions, and the application of Markov state modeling. The decomposition of MD-derived neutron dynamic susceptibilities is described, and the use of thismore » in combination with NMR spectroscopy. We discuss dynamics at very long times, including approximations to the infinite time mean-square displacement and nonequilibrium aspects of single-protein dynamics. Lastly, we examine how neutron scattering and MD can be combined to provide information on lipid nanodomains.« less
Object-based media and stream-based computing
NASA Astrophysics Data System (ADS)
Bove, V. Michael, Jr.
1998-03-01
Object-based media refers to the representation of audiovisual information as a collection of objects - the result of scene-analysis algorithms - and a script describing how they are to be rendered for display. Such multimedia presentations can adapt to viewing circumstances as well as to viewer preferences and behavior, and can provide a richer link between content creator and consumer. With faster networks and processors, such ideas become applicable to live interpersonal communications as well, creating a more natural and productive alternative to traditional videoconferencing. In this paper is outlined an example of object-based media algorithms and applications developed by my group, and present new hardware architectures and software methods that we have developed to enable meeting the computational requirements of object- based and other advanced media representations. In particular we describe stream-based processing, which enables automatic run-time parallelization of multidimensional signal processing tasks even given heterogenous computational resources.
Controlling the intermediate structure of an ionic liquid for f-block element separations
Abney, Carter W.; Do, Changwoo; Luo, Huimin; ...
2017-04-19
Recent research has revealed molecular structure beyond the inner coordination sphere is essential in defining the performance of separations processes, but nevertheless remains largely unexplored. Here we apply small angle neutron scattering (SANS) and x-ray absorption fine structure (XAFS) spectroscopy to investigate the structure of an ionic liquid system studied for f-block element separations. SANS data reveal dramatic changes in the ionic liquid microstructure (~150 Å) which we demonstrate can be controlled by judicious selection of counter ion. Mesoscale structural features (> 500 Å) are also observed as a function of metal concentration. XAFS analysis supports formation of extended aggregatemore » structures, similar to those observed in traditional solvent extraction processes, and suggest additional parallels may be drawn from further study. As a result, achieving precise tunability over the intermediate features is an important development in controlling mesoscale structure and realizing advanced new forms of soft matter.« less
Parallelization of ARC3D with Computer-Aided Tools
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Dean N.
2011-07-20
This report summarizes work carried out by the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Team for the period of January 1, 2011 through June 30, 2011. It discusses highlights, overall progress, period goals, and collaborations and lists papers and presentations. To learn more about our project, please visit our UV-CDAT website (URL: http://uv-cdat.org). This report will be forwarded to the program manager for the Department of Energy (DOE) Office of Biological and Environmental Research (BER), national and international collaborators and stakeholders, and to researchers working on a wide range of other climate model, reanalysis, and observation evaluation activities. Themore » UV-CDAT executive committee consists of Dean N. Williams of Lawrence Livermore National Laboratory (LLNL); Dave Bader and Galen Shipman of Oak Ridge National Laboratory (ORNL); Phil Jones and James Ahrens of Los Alamos National Laboratory (LANL), Claudio Silva of Polytechnic Institute of New York University (NYU-Poly); and Berk Geveci of Kitware, Inc. The UV-CDAT team consists of researchers and scientists with diverse domain knowledge whose home institutions also include the National Aeronautics and Space Administration (NASA) and the University of Utah. All work is accomplished under DOE open-source guidelines and in close collaboration with the project's stakeholders, domain researchers, and scientists. Working directly with BER climate science analysis projects, this consortium will develop and deploy data and computational resources useful to a wide variety of stakeholders, including scientists, policymakers, and the general public. Members of this consortium already collaborate with other institutions and universities in researching data discovery, management, visualization, workflow analysis, and provenance. The UV-CDAT team will address the following high-level visualization requirements: (1) Alternative parallel streaming statistics and analysis pipelines - Data parallelism, Task parallelism, Visualization parallelism; (2) Optimized parallel input/output (I/O); (3) Remote interactive execution; (4) Advanced intercomparison visualization; (5) Data provenance processing and capture; and (6) Interfaces for scientists - Workflow data analysis and visualization construction tools, and Visualization interfaces.« less
Parallelization of a Fully-Distributed Hydrologic Model using Sub-basin Partitioning
NASA Astrophysics Data System (ADS)
Vivoni, E. R.; Mniszewski, S.; Fasel, P.; Springer, E.; Ivanov, V. Y.; Bras, R. L.
2005-12-01
A primary obstacle towards advances in watershed simulations has been the limited computational capacity available to most models. The growing trend of model complexity, data availability and physical representation has not been matched by adequate developments in computational efficiency. This situation has created a serious bottleneck which limits existing distributed hydrologic models to small domains and short simulations. In this study, we present novel developments in the parallelization of a fully-distributed hydrologic model. Our work is based on the TIN-based Real-time Integrated Basin Simulator (tRIBS), which provides continuous hydrologic simulation using a multiple resolution representation of complex terrain based on a triangulated irregular network (TIN). While the use of TINs reduces computational demand, the sequential version of the model is currently limited over large basins (>10,000 km2) and long simulation periods (>1 year). To address this, a parallel MPI-based version of the tRIBS model has been implemented and tested using high performance computing resources at Los Alamos National Laboratory. Our approach utilizes domain decomposition based on sub-basin partitioning of the watershed. A stream reach graph based on the channel network structure is used to guide the sub-basin partitioning. Individual sub-basins or sub-graphs of sub-basins are assigned to separate processors to carry out internal hydrologic computations (e.g. rainfall-runoff transformation). Routed streamflow from each sub-basin forms the major hydrologic data exchange along the stream reach graph. Individual sub-basins also share subsurface hydrologic fluxes across adjacent boundaries. We demonstrate how the sub-basin partitioning provides computational feasibility and efficiency for a set of test watersheds in northeastern Oklahoma. We compare the performance of the sequential and parallelized versions to highlight the efficiency gained as the number of processors increases. We also discuss how the coupled use of TINs and parallel processing can lead to feasible long-term simulations in regional watersheds while preserving basin properties at high-resolution.
Atkinson, Quentin D; Gray, Russell D
2005-08-01
In The Descent of Man (1871), Darwin observed "curious parallels" between the processes of biological and linguistic evolution. These parallels mean that evolutionary biologists and historical linguists seek answers to similar questions and face similar problems. As a result, the theory and methodology of the two disciplines have evolved in remarkably similar ways. In addition to Darwin's curious parallels of process, there are a number of equally curious parallels and connections between the development of methods in biology and historical linguistics. Here we briefly review the parallels between biological and linguistic evolution and contrast the historical development of phylogenetic methods in the two disciplines. We then look at a number of recent studies that have applied phylogenetic methods to language data and outline some current problems shared by the two fields.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A; Faraj, Daniel A
2013-06-04
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A.; Faraj, Daniel A.
2012-12-11
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
NASA Technical Reports Server (NTRS)
Wigton, Larry
1996-01-01
Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
2010-05-01
connections near the hub end, and containing up to 0.48 million degrees of freedom. The models are analyzed for scala - bility and timing for hover and...Parallel and Scalable Rotor Dynamic Analysis 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK...will enable the modeling of critical couplings that occur in hingeless and bearingless hubs with advanced flex structures. Second , it will enable the
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Parallel pivoting combined with parallel reduction
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1987-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
Improving operating room productivity via parallel anesthesia processing.
Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R
2014-01-01
Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.
Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture
NASA Technical Reports Server (NTRS)
Jones, W. H.
1983-01-01
The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.
Complex MSH2 and MSH6 mutations in hypermutated microsatellite unstable advanced prostate cancer.
Pritchard, Colin C; Morrissey, Colm; Kumar, Akash; Zhang, Xiaotun; Smith, Christina; Coleman, Ilsa; Salipante, Stephen J; Milbank, Jennifer; Yu, Ming; Grady, William M; Tait, Jonathan F; Corey, Eva; Vessella, Robert L; Walsh, Tom; Shendure, Jay; Nelson, Peter S
2014-09-25
A hypermutated subtype of advanced prostate cancer was recently described, but prevalence and mechanisms have not been well-characterized. Here we find that 12% (7 of 60) of advanced prostate cancers are hypermutated, and that all hypermutated cancers have mismatch repair gene mutations and microsatellite instability (MSI). Mutations are frequently complex MSH2 or MSH6 structural rearrangements rather than MLH1 epigenetic silencing. Our findings identify parallels and differences in the mechanisms of hypermutation in prostate cancer compared with other MSI-associated cancers.
The impact of therapeutic reference pricing on innovation in cardiovascular medicine.
Sheridan, Desmond; Attridge, Jim
2006-12-01
Therapeutic reference pricing (TRP) places medicines to treat the same medical condition into groups or 'clusters' with a single common reimbursed price. Underpinning this economic measure is an implicit assumption that the products included in the cluster have an equivalent effect on a typical patient with this disease. 'Truly innovative' products can be exempt from inclusion in the cluster. This increasingly common approach to cost containment allocates products into one of two categories - truly innovative or therapeutically equivalent. This study examines the implications of TRP against the step-wise evolution of drugs for cardiovascular conditions over the past 50 years. It illustrates the complex interactions between advances in understanding of cellular and molecular disease mechanisms, diagnostic techniques, treatment concepts, and the synthesis, testing and commercialisation of products. It confirms the highly unpredictable and incremental nature of the innovation process. Medical progress in terms of improvement in patient outcomes over the long-term depends on the cumulative effect of year after year of painstaking incremental advances. It shows that the parallel processes of advances in scientific knowledge and the industrial 'investment-innovative cycle' involve highly developed sets of complementary capabilities and resources. A framework is developed to assess the impact of TRP upon research and development investment decisions and the development of therapeutic classes. We conclude that a simple categorisation of products as either 'truly innovative' or 'therapeutically equivalent' is inconsistent with the incremental processes of innovation and the resulting differentiated product streams revealed by our analysis. Widespread introduction of TRP would probably have prematurely curtailed development of many incremental innovations that became the preferred 'product of choice' by physicians for some indications and patients in managing the incidence of cardiovascular disease.
Improvements to Nuclear Data and Its Uncertainties by Theoretical Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Danon, Yaron; Nazarewicz, Witold; Talou, Patrick
2013-02-18
This project addresses three important gaps in existing evaluated nuclear data libraries that represent a significant hindrance against highly advanced modeling and simulation capabilities for the Advanced Fuel Cycle Initiative (AFCI). This project will: Develop advanced theoretical tools to compute prompt fission neutrons and gamma-ray characteristics well beyond average spectra and multiplicity, and produce new evaluated files of U and Pu isotopes, along with some minor actinides; Perform state-of-the-art fission cross-section modeling and calculations using global and microscopic model input parameters, leading to truly predictive fission cross-sections capabilities. Consistent calculations for a suite of Pu isotopes will be performed; Implementmore » innovative data assimilation tools, which will reflect the nuclear data evaluation process much more accurately, and lead to a new generation of uncertainty quantification files. New covariance matrices will be obtained for Pu isotopes and compared to existing ones. The deployment of a fleet of safe and efficient advanced reactors that minimize radiotoxic waste and are proliferation-resistant is a clear and ambitious goal of AFCI. While in the past the design, construction and operation of a reactor were supported through empirical trials, this new phase in nuclear energy production is expected to rely heavily on advanced modeling and simulation capabilities. To be truly successful, a program for advanced simulations of innovative reactors will have to develop advanced multi-physics capabilities, to be run on massively parallel super- computers, and to incorporate adequate and precise underlying physics. And all these areas have to be developed simultaneously to achieve those ambitious goals. Of particular interest are reliable fission cross-section uncertainty estimates (including important correlations) and evaluations of prompt fission neutrons and gamma-ray spectra and uncertainties.« less
Parallel Algorithms for Image Analysis.
1982-06-01
8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
Development of tailorable advanced blanket insulation for advanced space transportation systems
NASA Technical Reports Server (NTRS)
Calamito, Dominic P.
1987-01-01
Two items of Tailorable Advanced Blanket Insulation (TABI) for Advanced Space Transportation Systems were produced. The first consisted of flat panels made from integrally woven, 3-D fluted core having parallel fabric faces and connecting ribs of Nicalon silicon carbide yarns. The triangular cross section of the flutes were filled with mandrels of processed Q-Fiber Felt. Forty panels were prepared with only minimal problems, mostly resulting from the unavailability of insulation with the proper density. Rigidizing the fluted fabric prior to inserting the insulation reduced the production time. The procedures for producing the fabric, insulation mandrels, and TABI panels are described. The second item was an effort to determine the feasibility of producing contoured TABI shapes from gores cut from flat, insulated fluted core panels. Two gores of integrally woven fluted core and single ply fabric (ICAS) were insulated and joined into a large spherical shape employing a tadpole insulator at the mating edges. The fluted core segment of each ICAS consisted of an Astroquartz face fabric and Nicalon face and rib fabrics, while the single ply fabric segment was Nicalon. Further development will be required. The success of fabricating this assembly indicates that this concept may be feasible for certain types of space insulation requirements. The procedures developed for weaving the ICAS, joining the gores, and coating certain areas of the fabrics are presented.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Parallels between a Collaborative Research Process and the Middle Level Philosophy
ERIC Educational Resources Information Center
Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen
2014-01-01
The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…
Automated Vectorization of Decision-Based Algorithms
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
Virtually all existing vectorization algorithms are designed to only analyze the numeric properties of an algorithm and distribute those elements across multiple processors. This advances the state of the practice because it is the only known system, at the time of this reporting, that takes high-level statements and analyzes them for their decision properties and converts them to a form that allows them to automatically be executed in parallel. The software takes a high-level source program that describes a complex decision- based condition and rewrites it as a disjunctive set of component Boolean relations that can then be executed in parallel. This is important because parallel architectures are becoming more commonplace in conventional systems and they have always been present in NASA flight systems. This technology allows one to take existing condition-based code and automatically vectorize it so it naturally decomposes across parallel architectures.
Cusack, Rhodri; Vicente-Grabovetsky, Alejandro; Mitchell, Daniel J; Wild, Conor J; Auer, Tibor; Linke, Annika C; Peelle, Jonathan E
2014-01-01
Recent years have seen neuroimaging data sets becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complicated to set up and run (increasing the risk of human error) and time consuming to execute (restricting what analyses are attempted). Here we present an open-source framework, automatic analysis (aa), to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (re)done. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA). However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast, and efficient, for simple-single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.