NASA Astrophysics Data System (ADS)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; Kalinkin, Alexander A.
2017-02-01
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, which is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,'bottom-up' and 'top-down', are illustrated. Preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.
NASA Astrophysics Data System (ADS)
Burnett, W.
2016-12-01
The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; ...
2017-03-20
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
ESIF 2016: Modernizing Our Grid and Energy System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Becelaere, Kimberly
This 2016 annual report highlights work conducted at the Energy Systems Integration Facility (ESIF) in FY 2016, including grid modernization, high-performance computing and visualization, and INTEGRATE projects.
Web Based Parallel Programming Workshop for Undergraduate Education.
ERIC Educational Resources Information Center
Marcus, Robert L.; Robertson, Douglass
Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
Performance of High-Reliability Space-Qualified Processors Implementing Software Defined Radios
2014-03-01
ADDRESS(ES) AND ADDRESS(ES) Naval Postgraduate School, Department of Electrical and Computer Engineering, 833 Dyer Road, Monterey, CA 93943-5121 8...Chairman Jeffrey D. Paduan Electrical and Computer Engineering Dean of Research iii THIS PAGE...capability. Radiation in space poses a considerable threat to modern microelectronic devices, in particular to the high-performance low-cost computing
NASA Astrophysics Data System (ADS)
Bird, Robert; Nystrom, David; Albright, Brian
2017-10-01
The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Numerical Prediction of Pitch Damping Stability Derivatives for Finned Projectiles
2013-11-01
in part by a grant of high-performance computing time from the U.S. DOD High Performance Computing Modernization Program (HPCMP) at the Army...to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data...12 3.3.2 Time -Accurate Simulations
2011-06-01
4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
Role of HPC in Advancing Computational Aeroelasticity
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2004-01-01
On behalf of the High Performance Computing and Modernization Program (HPCMP) and NASA Advanced Supercomputing Division (NAS) a study is conducted to assess the role of supercomputers on computational aeroelasticity of aerospace vehicles. The study is mostly based on the responses to a web based questionnaire that was designed to capture the nuances of high performance computational aeroelasticity, particularly on parallel computers. A procedure is presented to assign a fidelity-complexity index to each application. Case studies based on major applications using HPCMP resources are presented.
2017-03-23
performance computing resources made available by the US Department of Defense High Performance Computing Modernization Program at the Air Force...1Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United...States Army Medical Research and Materiel Command, Fort Detrick, Maryland, USA Full list of author information is available at the end of the article
High performance flight computer developed for deep space applications
NASA Technical Reports Server (NTRS)
Bunker, Robert L.
1993-01-01
The development of an advanced space flight computer for real time embedded deep space applications which embodies the lessons learned on Galileo and modern computer technology is described. The requirements are listed and the design implementation that meets those requirements is described. The development of SPACE-16 (Spaceborne Advanced Computing Engine) (where 16 designates the databus width) was initiated to support the MM2 (Marine Mark 2) project. The computer is based on a radiation hardened emulation of a modern 32 bit microprocessor and its family of support devices including a high performance floating point accelerator. Additional custom devices which include a coprocessor to improve input/output capabilities, a memory interface chip, and an additional support chip that provide management of all fault tolerant features, are described. Detailed supporting analyses and rationale which justifies specific design and architectural decisions are provided. The six chip types were designed and fabricated. Testing and evaluation of a brass/board was initiated.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.
1990-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.
1992-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Computer simulation of a single pilot flying a modern high-performance helicopter
NASA Technical Reports Server (NTRS)
Zipf, Mark E.; Vogt, William G.; Mickle, Marlin H.; Hoelzeman, Ronald G.; Kai, Fei; Mihaloew, James R.
1988-01-01
Presented is a computer simulation of a human response pilot model able to execute operational flight maneuvers and vehicle stabilization of a modern high-performance helicopter. Low-order, single-variable, human response mechanisms, integrated to form a multivariable pilot structure, provide a comprehensive operational control over the vehicle. Evaluations of the integrated pilot were performed by direct insertion into a nonlinear, total-force simulation environment provided by NASA Lewis. Comparisons between the integrated pilot structure and single-variable pilot mechanisms are presented. Static and dynamically alterable configurations of the pilot structure are introduced to simulate pilot activities during vehicle maneuvers. These configurations, in conjunction with higher level, decision-making processes, are considered for use where guidance and navigational procedures, operational mode transfers, and resource sharing are required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arkin, Adam; Bader, David C.; Coffey, Richard
Understanding the fundamentals of genomic systems or the processes governing impactful weather patterns are examples of the types of simulation and modeling performed on the most advanced computing resources in America. High-performance computing and computational science together provide a necessary platform for the mission science conducted by the Biological and Environmental Research (BER) office at the U.S. Department of Energy (DOE). This report reviews BER’s computing needs and their importance for solving some of the toughest problems in BER’s portfolio. BER’s impact on science has been transformative. Mapping the human genome, including the U.S.-supported international Human Genome Project that DOEmore » began in 1987, initiated the era of modern biotechnology and genomics-based systems biology. And since the 1950s, BER has been a core contributor to atmospheric, environmental, and climate science research, beginning with atmospheric circulation studies that were the forerunners of modern Earth system models (ESMs) and by pioneering the implementation of climate codes onto high-performance computers. See http://exascaleage.org/ber/ for more information.« less
Computer simulation of multiple pilots flying a modern high performance helicopter
NASA Technical Reports Server (NTRS)
Zipf, Mark E.; Vogt, William G.; Mickle, Marlin H.; Hoelzeman, Ronald G.; Kai, Fei; Mihaloew, James R.
1988-01-01
A computer simulation of a human response pilot mechanism within the flight control loop of a high-performance modern helicopter is presented. A human response mechanism, implemented by a low order, linear transfer function, is used in a decoupled single variable configuration that exploits the dominant vehicle characteristics by associating cockpit controls and instrumentation with specific vehicle dynamics. Low order helicopter models obtained from evaluations of the time and frequency domain responses of a nonlinear simulation model, provided by NASA Lewis Research Center, are presented and considered in the discussion of the pilot development. Pilot responses and reactions to test maneuvers are presented and discussed. Higher level implementation, using the pilot mechanisms, are discussed and considered for their use in a comprehensive control structure.
Unified, Cross-Platform, Open-Source Library Package for High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozacik, Stephen
Compute power is continually increasing, but this increased performance is largely found in sophisticated computing devices and supercomputer resources that are difficult to use, resulting in under-utilization. We developed a unified set of programming tools that will allow users to take full advantage of the new technology by allowing them to work at a level abstracted away from the platform specifics, encouraging the use of modern computing systems, including government-funded supercomputer facilities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dennig, Yasmin
Sandia National Laboratories has a long history of significant contributions to the high performance community and industry. Our innovative computer architectures allowed the United States to become the first to break the teraFLOP barrier—propelling us to the international spotlight. Our advanced simulation and modeling capabilities have been integral in high consequence US operations such as Operation Burnt Frost. Strong partnerships with industry leaders, such as Cray, Inc. and Goodyear, have enabled them to leverage our high performance computing (HPC) capabilities to gain a tremendous competitive edge in the marketplace. As part of our continuing commitment to providing modern computing infrastructuremore » and systems in support of Sandia missions, we made a major investment in expanding Building 725 to serve as the new home of HPC systems at Sandia. Work is expected to be completed in 2018 and will result in a modern facility of approximately 15,000 square feet of computer center space. The facility will be ready to house the newest National Nuclear Security Administration/Advanced Simulation and Computing (NNSA/ASC) Prototype platform being acquired by Sandia, with delivery in late 2019 or early 2020. This new system will enable continuing advances by Sandia science and engineering staff in the areas of operating system R&D, operation cost effectiveness (power and innovative cooling technologies), user environment and application code performance.« less
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos
2011-01-01
General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Petascale supercomputing to accelerate the design of high-temperature alloys
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; ...
2017-10-25
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ'-Al 2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviourmore » of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. As a result, the approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.« less
Petascale supercomputing to accelerate the design of high-temperature alloys
NASA Astrophysics Data System (ADS)
Shin, Dongwon; Lee, Sangkeun; Shyam, Amit; Haynes, J. Allen
2017-12-01
Recent progress in high-performance computing and data informatics has opened up numerous opportunities to aid the design of advanced materials. Herein, we demonstrate a computational workflow that includes rapid population of high-fidelity materials datasets via petascale computing and subsequent analyses with modern data science techniques. We use a first-principles approach based on density functional theory to derive the segregation energies of 34 microalloying elements at the coherent and semi-coherent interfaces between the aluminium matrix and the θ‧-Al2Cu precipitate, which requires several hundred supercell calculations. We also perform extensive correlation analyses to identify materials descriptors that affect the segregation behaviour of solutes at the interfaces. Finally, we show an example of leveraging machine learning techniques to predict segregation energies without performing computationally expensive physics-based simulations. The approach demonstrated in the present work can be applied to any high-temperature alloy system for which key materials data can be obtained using high-performance computing.
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Department of Defense High Performance Computing Modernization Program. 2008 Annual Report
2009-04-01
place to another on the network. Without it, a computer could only talk to itself - no email, no web browsing, and no iTunes . Most of the Internet...Your SecurID Card ), Ken Renard Secure Wireless, Rob Scott and Stephen Bowman Securing Today’s Networks, Rich Whittney, Juniper Networks, Federal
Running R Statistical Computing Environment Software on the Peregrine
for the development of new statistical methodologies and enjoys a large user base. Please consult the distribution details. Natural language support but running in an English locale R is a collaborative project programming paradigms to better leverage modern HPC systems. The CRAN task view for High Performance Computing
Turbulence Model Effects on Cold-Gas Lateral Jet Interaction in a Supersonic Crossflow
2014-06-01
performance computing time from the U.S. Department of Defense (DOD) High Performance Computing Modernization program at the U.S. Army Research Laboratory... time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the...thanks Dr. Ross Chaplin , Defence Science Technology Laboratory, United Kingdom (UK), and Dr. David MacManus and Robert Christie, Cranfield University, UK
Joint the Center for Applied Scientific Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamblin, Todd; Bremer, Timo; Van Essen, Brian
The Center for Applied Scientific Computing serves as Livermore Lab’s window to the broader computer science, computational physics, applied mathematics, and data science research communities. In collaboration with academic, industrial, and other government laboratory partners, we conduct world-class scientific research and development on problems critical to national security. CASC applies the power of high-performance computing and the efficiency of modern computational methods to the realms of stockpile stewardship, cyber and energy security, and knowledge discovery for intelligence applications.
2015-07-01
performance computing time from the US Department of Defense (DOD) High Performance Computing Modernization program at the US Army Research Laboratory...Approved OMB No. 0704-0188 Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time ...dimensional, compressible, Reynolds-averaged Navier-Stokes (RANS) equations are solved using a finite volume method. A point-implicit time - integration
Parallel-vector unsymmetric Eigen-Solver on high performance computers
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Jiangning, Qin
1993-01-01
The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
Computer Simulation Performed for Columbia Project Cooling System
NASA Technical Reports Server (NTRS)
Ahmad, Jasim
2005-01-01
This demo shows a high-fidelity simulation of the air flow in the main computer room housing the Columbia (10,024 intel titanium processors) system. The simulation asseses the performance of the cooling system and identified deficiencies, and recommended modifications to eliminate them. It used two in house software packages on NAS supercomputers: Chimera Grid tools to generate a geometric model of the computer room, OVERFLOW-2 code for fluid and thermal simulation. This state-of-the-art technology can be easily extended to provide a general capability for air flow analyses on any modern computer room. Columbia_CFD_black.tiff
SISYPHUS: A high performance seismic inversion factory
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas
2016-04-01
In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Information Systems at Enterprise. Design of Secure Network of Enterprise
NASA Astrophysics Data System (ADS)
Saigushev, N. Y.; Mikhailova, U. V.; Vedeneeva, O. A.; Tsaran, A. A.
2018-05-01
No enterprise and company can do without designing its own corporate network in today's information society. It accelerates and facilitates the work of employees at any level, but contains a big threat to confidential information of the company. In addition to the data theft attackers, there are plenty of information threats posed by modern malware effects. In this regard, the computational security of corporate networks is an important component of modern information technologies of computer security for any enterprise. This article says about the design of the protected corporate network of the enterprise that provides the computers on the network access to the Internet, as well interoperability with the branch. The access speed to the Internet at a high level is provided through the use of high-speed access channels and load balancing between devices. The security of the designed network is performed through the use of VLAN technology as well as access lists and AAA server.
FY16 NRL DoD High Performance Computing Modernization Program
2017-09-15
explored both wind and wave forcing in the numerical wave tank. The model uses high spatial and temporal resolution and a multi-phase formulation to...Results: The ADVED_NS code was used to predict the effect of the standoff distance between micron- diameter wires and flow frequency on the total...contours for a flow over 3D wire mesh. Figure 2 shows verifications comparing computed and theoretical drag forces for the flow over two cylinders in an
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wong, Michael K.; Davidson, Megan
As part of Sandia’s nuclear deterrence mission, the B61-12 Life Extension Program (LEP) aims to modernize the aging weapon system. Modernization requires requalification and Sandia is using high performance computing to perform advanced computational simulations to better understand, evaluate, and verify weapon system performance in conjunction with limited physical testing. The Nose Bomb Subassembly (NBSA) of the B61-12 is responsible for producing a fuzing signal upon ground impact. The fuzing signal is dependent upon electromechanical impact sensors producing valid electrical fuzing signals at impact. Computer generated models were used to assess the timing between the impact sensor’s response to themore » deceleration of impact and damage to major components and system subassemblies. The modeling and simulation team worked alongside the physical test team to design a large-scale reverse ballistic test to not only assess system performance, but to also validate their computational models. The reverse ballistic test conducted at Sandia’s sled test facility sent a rocket sled with a representative target into a stationary B61-12 (NBSA) to characterize the nose crush and functional response of NBSA components. Data obtained from data recorders and high-speed photometrics were integrated with previously generated computer models in order to refine and validate the model’s ability to reliably simulate real-world effects. Large-scale tests are impractical to conduct for every single impact scenario. By creating reliable computer models, we can perform simulations that identify trends and produce estimates of outcomes over the entire range of required impact conditions. Sandia’s HPCs enable geometric resolution that was unachievable before, allowing for more fidelity and detail, and creating simulations that can provide insight to support evaluation of requirements and performance margins. As computing resources continue to improve, researchers at Sandia are hoping to improve these simulations so they provide increasingly credible analysis of the system response and performance over the full range of conditions.« less
Irregular Applications: Architectures & Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, John T.; Villa, Oreste; Tumeo, Antonino
Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers
NASA Astrophysics Data System (ADS)
Ogino, T.
High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
NASA Astrophysics Data System (ADS)
Alameda, J. C.
2011-12-01
Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into computational science and engineering codes. Finally, we are partnering with the lead PTP developers at IBM, to ensure we are as effective as possible within the Eclipse community development. We are also conducting training and outreach to our user community, including conference BOF sessions, monthly user calls, and an annual user meeting, so that we can best inform the improvements we make to Eclipse PTP. With these activities we endeavor to encourage use of modern software engineering practices, as enabled through the Eclipse IDE, with computational science and engineering applications. These practices include proper use of source code repositories, tracking and rectifying issues, measuring and monitoring code performance changes against both optimizations as well as ever-changing software stacks and configurations on HPC systems, as well as ultimately encouraging development and maintenance of testing suites -- things that have become commonplace in many software endeavors, but have lagged in the development of science applications. We view that the challenge with the increased complexity of both HPC systems and science applications demands the use of better software engineering methods, preferably enabled by modern tools such as Eclipse PTP, to help the computational science community thrive as we evolve the HPC landscape.
NASA Astrophysics Data System (ADS)
Schulthess, Thomas C.
2013-03-01
The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
High-performance computing with quantum processing units
Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.; ...
2017-03-01
The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less
High-performance computing with quantum processing units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.
The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less
OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing
NASA Astrophysics Data System (ADS)
Wei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping
2017-02-01
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
FY16 NRL DoD High Performance Computing Modernization Program Annual Reports
2017-09-15
explored both wind and wave forcing in the numerical wave tank. The model uses high spatial and temporal resolution and a multi-phase formulation to...Results: The ADVED_NS code was used to predict the effect of the standoff distance between micron- diameter wires and flow frequency on the total...contours for a flow over 3D wire mesh. Figure 2 shows verifications comparing computed and theoretical drag forces for the flow over two cylinders in an
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
High Performance Object-Oriented Scientific Programming in Fortran 90
NASA Technical Reports Server (NTRS)
Norton, Charles D.; Decyk, Viktor K.; Szymanski, Boleslaw K.
1997-01-01
We illustrate how Fortran 90 supports object-oriented concepts by example of plasma particle computations on the IBM SP. Our experience shows that Fortran 90 and object-oriented methodology give high performance while providing a bridge from Fortran 77 legacy codes to modern programming principles. All of our object-oriented Fortran 90 codes execute more quickly thatn the equeivalent C++ versions, yet the abstraction modelling capabilities used for scentific programming are comparably powereful.
Computational Science and Innovation
NASA Astrophysics Data System (ADS)
Dean, D. J.
2011-09-01
Simulations - utilizing computers to solve complicated science and engineering problems - are a key ingredient of modern science. The U.S. Department of Energy (DOE) is a world leader in the development of high-performance computing (HPC), the development of applied math and algorithms that utilize the full potential of HPC platforms, and the application of computing to science and engineering problems. An interesting general question is whether the DOE can strategically utilize its capability in simulations to advance innovation more broadly. In this article, I will argue that this is certainly possible.
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Department of Defense High Performance Computing Modernization Program. 2007 Annual Report
2008-03-01
Directorate, Kirtland AFB, NM Applications of Time-Accurate CFD in Order to Account for Blade -Row Interactions and Distortion Transfer in the Design of...Patterson AFB, OH Direct Numerical Simulations of Active Control for Low- Pressure Turbine Blades Herman Fasel, University of Arizona, Tucson, AZ (Air Force...interactions with the rotor wake . These HI-ARMS computations compare favorably with available wind tunnel test measurements of surface and flowfield
DNS of Flow in a Low-Pressure Turbine Cascade Using a Discontinuous-Galerkin Spectral-Element Method
NASA Technical Reports Server (NTRS)
Garai, Anirban; Diosady, Laslo Tibor; Murman, Scott; Madavan, Nateri
2015-01-01
A new computational capability under development for accurate and efficient high-fidelity direct numerical simulation (DNS) and large eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy-stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy and is implemented in a computationally efficient manner on a modern high performance computer architecture. A validation study using this method to perform DNS of flow in a low-pressure turbine airfoil cascade are presented. Preliminary results indicate that the method captures the main features of the flow. Discrepancies between the predicted results and the experiments are likely due to the effects of freestream turbulence not being included in the simulation and will be addressed in the final paper.
The path toward HEP High Performance Computing
NASA Astrophysics Data System (ADS)
Apostolakis, John; Brun, René; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro
2014-06-01
High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a "High Performance" implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from the recent technology evolution in computing.
FY07 NRL DoD High Performance Computing Modernization Program Annual Reports
2008-09-05
performed. Implicit and explicit solutions methods are used as appropriate. The primary finite element codes used are ABAQUS and ANSYS. User subroutines ...geometric complexities, loading path dependence, rate dependence, and interaction between loading types (electrical, thermal and mechanical). Work is not...are used for specialized material constitutive response. Coupled material responses, such as electrical- thermal for capacitor materials or electrical
Visualizing staggered fields and analyzing electromagnetic data with PerceptEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shasharina, Svetlana
This project resulted in VSimSP: a software for simulating large photonic devices of high-performance computers. It includes: GUI for Photonics Simulations; High-Performance Meshing Algorithm; 2d Order Multimaterials Algorithm; Mode Solver for Waveguides; 2d Order Material Dispersion Algorithm; S Parameters Calculation; High-Performance Workflow at NERSC ; and Large Photonic Devices Simulation Setups We believe we became the only company in the world which can simulate large photonics devices in 3D on modern supercomputers without the need to split them into subparts or do low-fidelity modeling. We started commercial engagement with a manufacturing company.
Optimizing Engineering Tools Using Modern Ground Architectures
2017-12-01
Considerations,” International Journal of Computer Science & Engineering Survey , vol. 5, no. 4, 2014. [10] R. Bell. (n.d). A beginner’s guide to big O notation...scientific community. Traditional computing architectures were not capable of processing the data efficiently, or in some cases, could not process the...thesis investigates how these modern computing architectures could be leveraged by industry and academia to improve the performance and capabilities of
The computational challenges of Earth-system science.
O'Neill, Alan; Steenman-Clark, Lois
2002-06-15
The Earth system--comprising atmosphere, ocean, land, cryosphere and biosphere--is an immensely complex system, involving processes and interactions on a wide range of space- and time-scales. To understand and predict the evolution of the Earth system is one of the greatest challenges of modern science, with success likely to bring enormous societal benefits. High-performance computing, along with the wealth of new observational data, is revolutionizing our ability to simulate the Earth system with computer models that link the different components of the system together. There are, however, considerable scientific and technical challenges to be overcome. This paper will consider four of them: complexity, spatial resolution, inherent uncertainty and time-scales. Meeting these challenges requires a significant increase in the power of high-performance computers. The benefits of being able to make reliable predictions about the evolution of the Earth system should, on their own, amply repay this investment.
A survey of GPU-based acceleration techniques in MRI reconstructions
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou
2018-01-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong
2018-03-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
NREL Evaluates Aquarius Liquid-Cooled High-Performance Computing Technology
HPC and influence the modern data center designer towards adoption of liquid cooling. Our shared technology. Aquila and Sandia chose NREL's HPC Data Center for the initial installation and evaluation because the data center is configured for liquid cooling, along with the required instrumentation to
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
Remote control system for high-perfomance computer simulation of crystal growth by the PFC method
NASA Astrophysics Data System (ADS)
Pavlyuk, Evgeny; Starodumov, Ilya; Osipov, Sergei
2017-04-01
Modeling of crystallization process by the phase field crystal method (PFC) - one of the important directions of modern computational materials science. In this paper, the practical side of the computer simulation of the crystallization process by the PFC method is investigated. To solve problems using this method, it is necessary to use high-performance computing clusters, data storage systems and other often expensive complex computer systems. Access to such resources is often limited, unstable and accompanied by various administrative problems. In addition, the variety of software and settings of different computing clusters sometimes does not allow researchers to use unified program code. There is a need to adapt the program code for each configuration of the computer complex. The practical experience of the authors has shown that the creation of a special control system for computing with the possibility of remote use can greatly simplify the implementation of simulations and increase the performance of scientific research. In current paper we show the principal idea of such a system and justify its efficiency.
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.
NASA Astrophysics Data System (ADS)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; Ng, Esmond G.; Maris, Pieter; Vary, James P.
2018-01-01
We describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. The use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. We also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.
High performance in silico virtual drug screening on many-core processors.
McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-05-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
High performance in silico virtual drug screening on many-core processors
Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-01-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
Geospace simulations using modern accelerator processor technology
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.
2009-12-01
OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
Some Observations on the Current Status of Performing Finite Element Analyses
NASA Technical Reports Server (NTRS)
Raju, Ivatury S.; Knight, Norman F., Jr; Shivakumar, Kunigal N.
2015-01-01
Aerospace structures are complex high-performance structures. Advances in reliable and efficient computing and modeling tools are enabling analysts to consider complex configurations, build complex finite element models, and perform analysis rapidly. Many of the early career engineers of today are very proficient in the usage of modern computers, computing engines, complex software systems, and visualization tools. These young engineers are becoming increasingly efficient in building complex 3D models of complicated aerospace components. However, the current trends demonstrate blind acceptance of the results of the finite element analysis results. This paper is aimed at raising an awareness of this situation. Examples of the common encounters are presented. To overcome the current trends, some guidelines and suggestions for analysts, senior engineers, and educators are offered.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Integrating Computer Architectures into the Design of High-Performance Controllers
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.; Leyland, Jane A.; Warmbrodt, William
1986-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, on-line graphics, and file management. This paper discusses five global design considerations that are useful to integrate array processor, multimicroprocessor, and host computer system architecture into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the non-real-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration will be briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind-tunnel environment, the control architecture can generally be applied to a wide range of automatic control applications.
Software Tools for Development on the Peregrine System | High-Performance
Computing | NREL Software Tools for Development on the Peregrine System Software Tools for and manage software at the source code level. Cross-Platform Make and SCons The "Cross-Platform Make" (CMake) package is from Kitware, and SCons is a modern software build tool based on Python
Impedance computations and beam-based measurements: A problem of discrepancy
NASA Astrophysics Data System (ADS)
Smaluk, Victor
2018-04-01
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictions based on the computed impedance budgets show a significant discrepancy. Three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.
Modern Computational Techniques for the HMMER Sequence Analysis
2013-01-01
This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944
NASA Astrophysics Data System (ADS)
Bessonov, O.; Silvestrov, P.
2017-02-01
This paper describes the general idea and the first implementation of the Interactive information and simulation system - an integrated environment that combines computational modules for modeling the aerodynamics and aerothermodynamics of re-entry space vehicles with the large collection of different information materials on this topic. The internal organization and the composition of the system are described and illustrated. Examples of the computational and information output are presented. The system has the unified implementation for Windows and Linux operation systems and can be deployed on any modern high-performance personal computer.
Real-time optical flow estimation on a GPU for a skied-steered mobile robot
NASA Astrophysics Data System (ADS)
Kniaz, V. V.
2016-04-01
Accurate egomotion estimation is required for mobile robot navigation. Often the egomotion is estimated using optical flow algorithms. For an accurate estimation of optical flow most of modern algorithms require high memory resources and processor speed. However simple single-board computers that control the motion of the robot usually do not provide such resources. On the other hand, most of modern single-board computers are equipped with an embedded GPU that could be used in parallel with a CPU to improve the performance of the optical flow estimation algorithm. This paper presents a new Z-flow algorithm for efficient computation of an optical flow using an embedded GPU. The algorithm is based on the phase correlation optical flow estimation and provide a real-time performance on a low cost embedded GPU. The layered optical flow model is used. Layer segmentation is performed using graph-cut algorithm with a time derivative based energy function. Such approach makes the algorithm both fast and robust in low light and low texture conditions. The algorithm implementation for a Raspberry Pi Model B computer is discussed. For evaluation of the algorithm the computer was mounted on a Hercules mobile skied-steered robot equipped with a monocular camera. The evaluation was performed using a hardware-in-the-loop simulation and experiments with Hercules mobile robot. Also the algorithm was evaluated using KITTY Optical Flow 2015 dataset. The resulting endpoint error of the optical flow calculated with the developed algorithm was low enough for navigation of the robot along the desired trajectory.
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing
Fang, Ye; Ding, Yun; Feinstein, Wei P.; Koppelman, David M.; Moreno, Juana; Jarrell, Mark; Ramanujam, J.; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249. PMID:27420300
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing.
Fang, Ye; Ding, Yun; Feinstein, Wei P; Koppelman, David M; Moreno, Juana; Jarrell, Mark; Ramanujam, J; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249.
Dense and Sparse Matrix Operations on the Cell Processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel W.; Shalf, John; Oliker, Leonid
2005-05-01
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Improvements in the efficiency of turboexpanders in cryogenic applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agahi, R.R.; Lin, M.C.; Ershaghi, B.
1996-12-31
Process designers have utilized turboexpanders in cryogenic processes because of their higher thermal efficiencies when compared with conventional refrigeration cycles. Process design and equipment performance have improved substantially through the utilization of modern technologies. Turboexpander manufacturers have also adopted Computational Fluid Dynamic Software, Computer Numerical Control Technology and Holography Techniques to further improve an already impressive turboexpander efficiency performance. In this paper, the authors explain the design process of the turboexpander utilizing modern technology. Two cases of turboexpanders processing helium (4.35{degrees}K) and hydrogen (56{degrees}K) will be presented.
Aerodynamics of High-Lift Configuration Civil Aircraft Model in JAXA
NASA Astrophysics Data System (ADS)
Yokokawa, Yuzuru; Murayama, Mitsuhiro; Ito, Takeshi; Yamamoto, Kazuomi
This paper presents basic aerodynamics and stall characteristics of the high-lift configuration aircraft model JSM (JAXA Standard Model). During research process of developing high-lift system design method, wind tunnel testing at JAXA 6.5m by 5.5m low-speed wind tunnel and Navier-Stokes computation on unstructured hybrid mesh were performed for a realistic configuration aircraft model equipped with high-lift devices, fuselage, nacelle-pylon, slat tracks and Flap Track Fairings (FTF), which was assumed 100 passenger class modern commercial transport aircraft. The testing and the computation aimed to understand flow physics and then to obtain some guidelines for designing a high performance high-lift system. As a result of the testing, Reynolds number effects within linear region and stall region were observed. Analysis of static pressure distribution and flow visualization gave the knowledge to understand the aerodynamic performance. CFD could capture the whole characteristics of basic aerodynamics and clarify flow mechanism which governs stall characteristics even for complicated geometry and its flow field. This collaborative work between wind tunnel testing and CFD is advantageous for improving or has improved the aerodynamic performance.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan
2007-01-01
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
High-performance noncontact thermal diode via asymmetric nanostructures
NASA Astrophysics Data System (ADS)
Shen, Jiadong; Liu, Xianglei; He, Huan; Wu, Weitao; Liu, Baoan
2018-05-01
Electric diodes, though laying the foundation of modern electronics and information processing industries, suffer from ineffectiveness and even failure at high temperatures. Thermal diodes are promising alternatives to relieve above limitations, but usually possess low rectification ratios, and how to obtain a high-performance thermal rectification effect is still an open question. This paper proposes an efficient contactless thermal diode based on the near-field thermal radiation of asymmetric doped silicon nanostructures. The rectification ratio computed via exact scattering theories is demonstrated to be as high as 10 at a nanoscale gap distance and period, outperforming the counterpart flat-plate diode by more than one order of magnitude. This extraordinary performance mainly lies in the higher forward and lower reverse radiative heat flux within the low frequency band compared with the counterpart flat-plate diode, which is caused by a lower loss and smaller cut-off wavevector of nanostructures for the forward and reversed scheme, respectively. This work opens new routes to realize high performance thermal diodes, and may have wide applications in efficient thermal computing, thermal information processing, and thermal management.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures
Manolakos, Elias S.
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.
Sharma, Anuj; Manolakos, Elias S
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao; ...
2017-09-14
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shao, Meiyue; Aktulga, H. Metin; Yang, Chao
In this paper, we describe a number of recently developed techniques for improving the performance of large-scale nuclear configuration interaction calculations on high performance parallel computers. We show the benefit of using a preconditioned block iterative method to replace the Lanczos algorithm that has traditionally been used to perform this type of computation. The rapid convergence of the block iterative method is achieved by a proper choice of starting guesses of the eigenvectors and the construction of an effective preconditioner. These acceleration techniques take advantage of special structure of the nuclear configuration interaction problem which we discuss in detail. Themore » use of a block method also allows us to improve the concurrency of the computation, and take advantage of the memory hierarchy of modern microprocessors to increase the arithmetic intensity of the computation relative to data movement. Finally, we also discuss the implementation details that are critical to achieving high performance on massively parallel multi-core supercomputers, and demonstrate that the new block iterative solver is two to three times faster than the Lanczos based algorithm for problems of moderate sizes on a Cray XC30 system.« less
Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis
NASA Technical Reports Server (NTRS)
Duffy, Daniel Q.; Schnase, John L.; Thompson, John H.; Freeman, Shawn M.; Clune, Thomas L.
2012-01-01
MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows.
NASA Astrophysics Data System (ADS)
Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan
2017-08-01
The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.
Merging the Machines of Modern Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Laura; Collins, Jim
Two recent projects have harnessed supercomputing resources at the US Department of Energy’s Argonne National Laboratory in a novel way to support major fusion science and particle collider experiments. Using leadership computing resources, one team ran fine-grid analysis of real-time data to make near-real-time adjustments to an ongoing experiment, while a second team is working to integrate Argonne’s supercomputers into the Large Hadron Collider/ATLAS workflow. Together these efforts represent a new paradigm of the high-performance computing center as a partner in experimental science.
ERIC Educational Resources Information Center
de Mestre, Neville
2004-01-01
Computers were invented to help mathematicians perform long and complicated calculations more efficiently. By the time that a computing area became a familiar space in primary and secondary schools, the initial motivation for computer use had been submerged in the many other functions that modern computers now accomplish. Not only the mathematics…
NASA Astrophysics Data System (ADS)
O'Malley, D.; Vesselinov, V. V.
2017-12-01
Classical microprocessors have had a dramatic impact on hydrology for decades, due largely to the exponential growth in computing power predicted by Moore's law. However, this growth is not expected to continue indefinitely and has already begun to slow. Quantum computing is an emerging alternative to classical microprocessors. Here, we demonstrated cutting edge inverse model analyses utilizing some of the best available resources in both worlds: high-performance classical computing and a D-Wave quantum annealer. The classical high-performance computing resources are utilized to build an advanced numerical model that assimilates data from O(10^5) observations, including water levels, drawdowns, and contaminant concentrations. The developed model accurately reproduces the hydrologic conditions at a Los Alamos National Laboratory contamination site, and can be leveraged to inform decision-making about site remediation. We demonstrate the use of a D-Wave 2X quantum annealer to solve hydrologic inverse problems. This work can be seen as an early step in quantum-computational hydrology. We compare and contrast our results with an early inverse approach in classical-computational hydrology that is comparable to the approach we use with quantum annealing. Our results show that quantum annealing can be useful for identifying regions of high and low permeability within an aquifer. While the problems we consider are small-scale compared to the problems that can be solved with modern classical computers, they are large compared to the problems that could be solved with early classical CPUs. Further, the binary nature of the high/low permeability problem makes it well-suited to quantum annealing, but challenging for classical computers.
Analysis of lead twist in modern high-performance grinding methods
NASA Astrophysics Data System (ADS)
Kundrák, J.; Gyáni, K.; Felhő, C.; Markopoulos, AP; Deszpoth, I.
2016-11-01
According to quality requirements of road vehicles shafts, which bear dynamic seals, twisted-pattern micro-geometrical topography is not allowed. It is a question whether newer modern grinding methods - such as quick-point grinding and peel grinding - could provide twist- free topography. According to industrial experience, twist-free surfaces can be made, however with certain settings, same twist occurs. In this paper it is proved by detailed chip-geometrical analysis that the topography generated by the new procedures is theoretically twist-patterned because of the feeding motion of the CBN tool. The presented investigation was carried out by a single-grain wheel model and computer simulation.
Design Trade-off Between Performance and Fault-Tolerance of Space Onboard Computers
NASA Astrophysics Data System (ADS)
Gorbunov, M. S.; Antonov, A. A.
2017-01-01
It is well known that there is a trade-off between performance and power consumption in onboard computers. The fault-tolerance is another important factor affecting performance, chip area and power consumption. Involving special SRAM cells and error-correcting codes is often too expensive with relation to the performance needed. We discuss the possibility of finding the optimal solutions for modern onboard computer for scientific apparatus focusing on multi-level cache memory design.
Modern Design of Resonant Edge-Slot Array Antennas
NASA Technical Reports Server (NTRS)
Gosselin, R. B.
2006-01-01
Resonant edge-slot (slotted-waveguide) array antennas can now be designed very accurately following a modern computational approach like that followed for some other microwave components. This modern approach makes it possible to design superior antennas at lower cost than was previously possible. Heretofore, the physical and engineering knowledge of resonant edge-slot array antennas had remained immature since they were introduced during World War II. This is because despite their mechanical simplicity, high reliability, and potential for operation with high efficiency, the electromagnetic behavior of resonant edge-slot antennas is very complex. Because engineering design formulas and curves for such antennas are not available in the open literature, designers have been forced to implement iterative processes of fabricating and testing multiple prototypes to derive design databases, each unique for a specific combination of operating frequency and set of waveguide tube dimensions. The expensive, time-consuming nature of these processes has inhibited the use of resonant edge-slot antennas. The present modern approach reduces costs by making it unnecessary to build and test multiple prototypes. As an additional benefit, this approach affords a capability to design an array of slots having different dimensions to taper the antenna illumination to reduce the amplitudes of unwanted side lobes. The heart of the modern approach is the use of the latest commercially available microwave-design software, which implements finite-element models of electromagnetic fields in and around waveguides, antenna elements, and similar components. Instead of building and testing prototypes, one builds a database and constructs design curves from the results of computational simulations for sets of design parameters. The figure shows a resonant edge-slot antenna designed following this approach. Intended for use as part of a radiometer operating at a frequency of 10.7 GHz, this antenna was fabricated from dimensions defined exclusively by results of computational simulations. The final design was found to be well optimized and to yield performance exceeding that initially required.
Department of Defense High Performance Computing Modernization Program. 2006 Annual Report
2007-03-01
Department. We successfully completed several software development projects that introduced parallel, scalable production software now in use across the...imagined. They are developing and deploying weather and ocean models that allow our soldiers, sailors, marines and airmen to plan missions more effectively...and to navigate adverse environments safely. They are modeling molecular interactions leading to the development of higher energy fuels, munitions
Thin client performance for remote 3-D image display.
Lai, Albert; Nieh, Jason; Laine, Andrew; Starren, Justin
2003-01-01
Several trends in biomedical computing are converging in a way that will require new approaches to telehealth image display. Image viewing is becoming an "anytime, anywhere" activity. In addition, organizations are beginning to recognize that healthcare providers are highly mobile and optimal care requires providing information wherever the provider and patient are. Thin-client computing is one way to support image viewing this complex environment. However little is known about the behavior of thin client systems in supporting image transfer in modern heterogeneous networks. Our results show that using thin-clients can deliver acceptable performance over conditions commonly seen in wireless networks if newer protocols optimized for these conditions are used.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
NASA Astrophysics Data System (ADS)
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Hofmann, Hannes G; Keck, Benjamin; Rohkohl, Christopher; Hornegger, Joachim
2011-01-01
Interventional reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. Hardware optimization is not an option but mandatory for interventional image processing and, in particular, for image reconstruction due to the high demands on performance. Several groups have published fast analytical 3-D reconstruction on highly parallel hardware such as GPUs to mitigate this issue. The authors show that the performance of modern CPU-based systems is in the same order as current GPUs for static 3-D reconstruction and outperforms them for a recent motion compensated (3-D+time) image reconstruction algorithm. This work investigates two algorithms: Static 3-D reconstruction as well as a recent motion compensated algorithm. The evaluation was performed using a standardized reconstruction benchmark, RABBITCT, to get comparable results and two additional clinical data sets. The authors demonstrate for a parametric B-spline motion estimation scheme that the derivative computation, which requires many write operations to memory, performs poorly on the GPU and can highly benefit from modern CPU architectures with large caches. Moreover, on a 32-core Intel Xeon server system, the authors achieve linear scaling with the number of cores used and reconstruction times almost in the same range as current GPUs. Algorithmic innovations in the field of motion compensated image reconstruction may lead to a shift back to CPUs in the future. For analytical 3-D reconstruction, the authors show that the gap between GPUs and CPUs became smaller. It can be performed in less than 20 s (on-the-fly) using a 32-core server.
Borresen, Jon; Lynch, Stephen
2012-01-01
In the 1940s, the first generation of modern computers used vacuum tube oscillators as their principle components, however, with the development of the transistor, such oscillator based computers quickly became obsolete. As the demand for faster and lower power computers continues, transistors are themselves approaching their theoretical limit and emerging technologies must eventually supersede them. With the development of optical oscillators and Josephson junction technology, we are again presented with the possibility of using oscillators as the basic components of computers, and it is possible that the next generation of computers will be composed almost entirely of oscillatory devices. Here, we demonstrate how coupled threshold oscillators may be used to perform binary logic in a manner entirely consistent with modern computer architectures. We describe a variety of computational circuitry and demonstrate working oscillator models of both computation and memory.
Impedance computations and beam-based measurements: A problem of discrepancy
Smaluk, Victor
2018-04-21
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Impedance computations and beam-based measurements: A problem of discrepancy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smaluk, Victor
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Load management strategy for Particle-In-Cell simulations in high energy particle acceleration
NASA Astrophysics Data System (ADS)
Beck, A.; Frederiksen, J. T.; Dérouillat, J.
2016-09-01
In the wake of the intense effort made for the experimental CILEX project, numerical simulation campaigns have been carried out in order to finalize the design of the facility and to identify optimal laser and plasma parameters. These simulations bring, of course, important insight into the fundamental physics at play. As a by-product, they also characterize the quality of our theoretical and numerical models. In this paper, we compare the results given by different codes and point out algorithmic limitations both in terms of physical accuracy and computational performances. These limitations are illustrated in the context of electron laser wakefield acceleration (LWFA). The main limitation we identify in state-of-the-art Particle-In-Cell (PIC) codes is computational load imbalance. We propose an innovative algorithm to deal with this specific issue as well as milestones towards a modern, accurate high-performance PIC code for high energy particle acceleration.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Watson, Willie R. (Technical Monitor)
2005-01-01
The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.
The Helicopter Antenna Radiation Prediction Code (HARP)
NASA Technical Reports Server (NTRS)
Klevenow, F. T.; Lynch, B. G.; Newman, E. H.; Rojas, R. G.; Scheick, J. T.; Shamansky, H. T.; Sze, K. Y.
1990-01-01
The first nine months effort in the development of a user oriented computer code, referred to as the HARP code, for analyzing the radiation from helicopter antennas is described. The HARP code uses modern computer graphics to aid in the description and display of the helicopter geometry. At low frequencies the helicopter is modeled by polygonal plates, and the method of moments is used to compute the desired patterns. At high frequencies the helicopter is modeled by a composite ellipsoid and flat plates, and computations are made using the geometrical theory of diffraction. The HARP code will provide a user friendly interface, employing modern computer graphics, to aid the user to describe the helicopter geometry, select the method of computation, construct the desired high or low frequency model, and display the results.
Van den Bempt, Maxim; Van Genechten, Wouter; Claes, Toon; Claes, Steven
2016-12-01
The aim of this study was to give an overview of the accuracy of coronal limb alignment correction after high tibial osteotomy (HTO) for the arthritic varus knee by performing a systematic review of the literature. The databases PubMed, MEDLINE and Cochrane Library were screened for relevant articles. Only prospective clinical studies with the accuracy of alignment correction by performing HTO as primary or secondary objective were included. Fifteen studies were included in this systematic review and were subdivided in 23 cohorts. A total of 966 procedures were considered. Nine cohorts used computer navigation during HTO and the other 14 cohorts used a conventional method. In seven computer navigation cohorts, at least 75% of the study population fell into the accepted "range of accuracy" (AR) as proposed by the different studies, but only six out of 14 conventional cohorts reached this percentage. Four out of eight conventional cohorts that provided data on under- and overcorrection, had a tendency to undercorrection. The accuracy of coronal alignment corrections using conventional HTO falls short. The number of procedures outside the proposed AR is surprising and exposes a critical concern for modern HTO. Computer navigation might improve the accuracy of correction, but its use is not widespread among orthopedic surgeons. Although HTO procedures have been shown to be successful in the treatment of unicompartmental knee arthritis when performed accurately, the results of this review stress the importance of ongoing efforts in order to improve correction accuracy in modern HTO. Copyright © 2016 Elsevier B.V. All rights reserved.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.
Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng
2014-10-01
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
Shaikhouni, Ammar; Elder, J Bradley
2012-11-01
At the turn of the twentieth century, the only computational device used in neurosurgical procedures was the brain of the surgeon. Today, most neurosurgical procedures rely at least in part on the use of a computer to help perform surgeries accurately and safely. The techniques that revolutionized neurosurgery were mostly developed after the 1950s. Just before that era, the transistor was invented in the late 1940s, and the integrated circuit was invented in the late 1950s. During this time, the first automated, programmable computational machines were introduced. The rapid progress in the field of neurosurgery not only occurred hand in hand with the development of modern computers, but one also can state that modern neurosurgery would not exist without computers. The focus of this article is the impact modern computers have had on the practice of neurosurgery. Neuroimaging, neuronavigation, and neuromodulation are examples of tools in the armamentarium of the modern neurosurgeon that owe each step in their evolution to progress made in computer technology. Advances in computer technology central to innovations in these fields are highlighted, with particular attention to neuroimaging. Developments over the last 10 years in areas of sensors and robotics that promise to transform the practice of neurosurgery further are discussed. Potential impacts of advances in computers related to neurosurgery in developing countries and underserved regions are also discussed. As this article illustrates, the computer, with its underlying and related technologies, is central to advances in neurosurgery over the last half century. Copyright © 2012 Elsevier Inc. All rights reserved.
2016-09-01
HPCMP will continue to be a key resource in solving challenging problems for the Department of Defense . 1 Fall 2016 High-F idel i ty Simulat ions of...laser interactions. The group had studied plasma expansion experimentally, but this wasn’t sufficient to understand the problem . Feister adapted and...focused on increasing the efficiency of jet turbine engines and extending aircraft flight ranges by changing the shape (articulation) of the turbine
Some observations on computer lip-reading: moving from the dream to the reality
NASA Astrophysics Data System (ADS)
Bear, Helen L.; Owen, Gari; Harvey, Richard; Theobald, Barry-John
2014-10-01
In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called "visemes" for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution, pose, lighting and other practical factors. However, the working assumption that visemes, which are the visual equivalent of phonemes, are the best unit for recognition does need further examination. We conclude that visemes, which were defined over a century ago, are unlikely to be optimal for a modern computer lip-reading system.
A computer analysis of the RF performance of a ground-mounted, air-supported radome
NASA Astrophysics Data System (ADS)
Punnett, M. B.; Joy, E. B.
Several reports and actual operating experience have highlighted the degradation of RF Performance which can occur when SSR or IFF antenna are mounted above primary search antenna within metal space frame or dielectric space frame radomes. These effects are usually attributed to both the high incidence angles and sensitivity of the low gain antennae to sidelobe changes due to scattered energy. Although it has been widely accepted that thin membrane radomes would provide superior performance for this application, there has been little supporting documentation. A plane-wave-spectrum (PWS) computer-based radome analysis was conducted to assess the performance of a specific air-supported radome for the SSR application. In conducting the analysis a mathematical model of a modern SSR antenna was combined with a model of an existing Birdair radome design.
A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture
NASA Astrophysics Data System (ADS)
Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.
2017-12-01
A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.
Effects of Inlet Distortion on Aeromechanical Stability of a Forward-Swept High-Speed Fan
NASA Technical Reports Server (NTRS)
Herrick, Gregory P.
2011-01-01
Concerns regarding noise, propulsive efficiency, and fuel burn are inspiring aircraft designs wherein the propulsive turbomachines are partially (or fully) embedded within the airframe; such designs present serious concerns with regard to aerodynamic and aeromechanic performance of the compression system in response to inlet distortion. Separately, a forward-swept high-speed fan was developed to address noise concerns of modern podded turbofans; however this fan encounters aeroelastic instability (flutter) as it approaches stall. A three-dimensional, unsteady, Navier-Stokes computational fluid dynamics code is applied to analyze and corroborate fan performance with clean inlet flow. This code, already validated in its application to assess aerodynamic damping of vibrating blades at various flow conditions, is modified and then applied in a computational study to preliminarily assess the effects of inlet distortion on aeroelastic stability of the fan. Computational engineering application and implementation issues are discussed, followed by an investigation into the aeroelastic behavior of the fan with clean and distorted inlets.
FY06 NRL DoD High Performance Computing Modernization Program Annual Reports
2007-10-31
our simulations yield important new information on the amount and form of the energy that is released by these explosive events. These results...coupled with the ideal-gas equation of state and a one-step Arrhenuis kinetics of energy release. The equations are solved using the explicit...practical applications, including hydrogen safety and pulse -detonation engines (PDE). For example, the results summarizing the effect of obstacle
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application.
Hanwell, Marcus D; de Jong, Wibe A; Harris, Christopher J
2017-10-30
An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction-connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platform with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web-going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application
Hanwell, Marcus D.; de Jong, Wibe A.; Harris, Christopher J.
2017-10-30
An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction - connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platformmore » with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web - going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.« less
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hanwell, Marcus D.; de Jong, Wibe A.; Harris, Christopher J.
An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction - connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platformmore » with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web - going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.« less
Borresen, Jon; Lynch, Stephen
2012-01-01
In the 1940s, the first generation of modern computers used vacuum tube oscillators as their principle components, however, with the development of the transistor, such oscillator based computers quickly became obsolete. As the demand for faster and lower power computers continues, transistors are themselves approaching their theoretical limit and emerging technologies must eventually supersede them. With the development of optical oscillators and Josephson junction technology, we are again presented with the possibility of using oscillators as the basic components of computers, and it is possible that the next generation of computers will be composed almost entirely of oscillatory devices. Here, we demonstrate how coupled threshold oscillators may be used to perform binary logic in a manner entirely consistent with modern computer architectures. We describe a variety of computational circuitry and demonstrate working oscillator models of both computation and memory. PMID:23173034
Peng, Yun; Miller, Brandi D; Boone, Timothy B; Zhang, Yingchun
2018-02-12
Weakened pelvic floor support is believed to be the main cause of various pelvic floor disorders. Modern theories of pelvic floor support stress on the structural and functional integrity of multiple structures and their interplay to maintain normal pelvic floor functions. Connective tissues provide passive pelvic floor support while pelvic floor muscles provide active support through voluntary contraction. Advanced modern medical technologies allow us to comprehensively and thoroughly evaluate the interaction of supporting structures and assess both active and passive support functions. The pathophysiology of various pelvic floor disorders associated with pelvic floor weakness is now under scrutiny from the combination of (1) morphological, (2) dynamic (through computational modeling), and (3) neurophysiological perspectives. This topical review aims to update newly emerged studies assessing pelvic floor support function among these three categories. A literature search was performed with emphasis on (1) medical imaging studies that assess pelvic floor muscle architecture, (2) subject-specific computational modeling studies that address new topics such as modeling muscle contractions, and (3) pelvic floor neurophysiology studies that report novel devices or findings such as high-density surface electromyography techniques. We found that recent computational modeling studies are featured with more realistic soft tissue constitutive models (e.g., active muscle contraction) as well as an increasing interest in simulating surgical interventions (e.g., artificial sphincter). Diffusion tensor imaging provides a useful non-invasive tool to characterize pelvic floor muscles at the microstructural level, which can be potentially used to improve the accuracy of the simulation of muscle contraction. Studies using high-density surface electromyography anal and vaginal probes on large patient cohorts have been recently reported. Influences of vaginal delivery on the distribution of innervation zones of pelvic floor muscles are clarified, providing useful guidance for a better protection of women during delivery. We are now in a period of transition to advanced diagnostic and predictive pelvic floor medicine. Our findings highlight the application of diffusion tensor imaging, computational models with consideration of active pelvic floor muscle contraction, high-density surface electromyography, and their potential integration, as tools to push the boundary of our knowledge in pelvic floor support and better shape current clinical practice.
Extreme Scale Computing to Secure the Nation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, D L; McGraw, J R; Johnson, J R
2009-11-10
Since the dawn of modern electronic computing in the mid 1940's, U.S. national security programs have been dominant users of every new generation of high-performance computer. Indeed, the first general-purpose electronic computer, ENIAC (the Electronic Numerical Integrator and Computer), was used to calculate the expected explosive yield of early thermonuclear weapons designs. Even the U. S. numerical weather prediction program, another early application for high-performance computing, was initially funded jointly by sponsors that included the U.S. Air Force and Navy, agencies interested in accurate weather predictions to support U.S. military operations. For the decades of the cold war, national securitymore » requirements continued to drive the development of high performance computing (HPC), including advancement of the computing hardware and development of sophisticated simulation codes to support weapons and military aircraft design, numerical weather prediction as well as data-intensive applications such as cryptography and cybersecurity U.S. national security concerns continue to drive the development of high-performance computers and software in the U.S. and in fact, events following the end of the cold war have driven an increase in the growth rate of computer performance at the high-end of the market. This mainly derives from our nation's observance of a moratorium on underground nuclear testing beginning in 1992, followed by our voluntary adherence to the Comprehensive Test Ban Treaty (CTBT) beginning in 1995. The CTBT prohibits further underground nuclear tests, which in the past had been a key component of the nation's science-based program for assuring the reliability, performance and safety of U.S. nuclear weapons. In response to this change, the U.S. Department of Energy (DOE) initiated the Science-Based Stockpile Stewardship (SBSS) program in response to the Fiscal Year 1994 National Defense Authorization Act, which requires, 'in the absence of nuclear testing, a progam to: (1) Support a focused, multifaceted program to increase the understanding of the enduring stockpile; (2) Predict, detect, and evaluate potential problems of the aging of the stockpile; (3) Refurbish and re-manufacture weapons and components, as required; and (4) Maintain the science and engineering institutions needed to support the nation's nuclear deterrent, now and in the future'. This program continues to fulfill its national security mission by adding significant new capabilities for producing scientific results through large-scale computational simulation coupled with careful experimentation, including sub-critical nuclear experiments permitted under the CTBT. To develop the computational science and the computational horsepower needed to support its mission, SBSS initiated the Accelerated Strategic Computing Initiative, later renamed the Advanced Simulation & Computing (ASC) program (sidebar: 'History of ASC Computing Program Computing Capability'). The modern 3D computational simulation capability of the ASC program supports the assessment and certification of the current nuclear stockpile through calibration with past underground test (UGT) data. While an impressive accomplishment, continued evolution of national security mission requirements will demand computing resources at a significantly greater scale than we have today. In particular, continued observance and potential Senate confirmation of the Comprehensive Test Ban Treaty (CTBT) together with the U.S administration's promise for a significant reduction in the size of the stockpile and the inexorable aging and consequent refurbishment of the stockpile all demand increasing refinement of our computational simulation capabilities. Assessment of the present and future stockpile with increased confidence of the safety and reliability without reliance upon calibration with past or future test data is a long-term goal of the ASC program. This will be accomplished through significant increases in the scientific bases that underlie the computational tools. Computer codes must be developed that replace phenomenology with increased levels of scientific understanding together with an accompanying quantification of uncertainty. These advanced codes will place significantly higher demands on the computing infrastructure than do the current 3D ASC codes. This article discusses not only the need for a future computing capability at the exascale for the SBSS program, but also considers high performance computing requirements for broader national security questions. For example, the increasing concern over potential nuclear terrorist threats demands a capability to assess threats and potential disablement technologies as well as a rapid forensic capability for determining a nuclear weapons design from post-detonation evidence (nuclear counterterrorism).« less
A High Performance VLSI Computer Architecture For Computer Graphics
NASA Astrophysics Data System (ADS)
Chin, Chi-Yuan; Lin, Wen-Tai
1988-10-01
A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
2016-09-28
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Highly parallel implementation of non-adiabatic Ehrenfest molecular dynamics
NASA Astrophysics Data System (ADS)
Kanai, Yosuke; Schleife, Andre; Draeger, Erik; Anisimov, Victor; Correa, Alfredo
2014-03-01
While the adiabatic Born-Oppenheimer approximation tremendously lowers computational effort, many questions in modern physics, chemistry, and materials science require an explicit description of coupled non-adiabatic electron-ion dynamics. Electronic stopping, i.e. the energy transfer of a fast projectile atom to the electronic system of the target material, is a notorious example. We recently implemented real-time time-dependent density functional theory based on the plane-wave pseudopotential formalism in the Qbox/qb@ll codes. We demonstrate that explicit integration using a fourth-order Runge-Kutta scheme is very suitable for modern highly parallelized supercomputers. Applying the new implementation to systems with hundreds of atoms and thousands of electrons, we achieved excellent performance and scalability on a large number of nodes both on the BlueGene based ``Sequoia'' system at LLNL as well as the Cray architecture of ``Blue Waters'' at NCSA. As an example, we discuss our work on computing the electronic stopping power of aluminum and gold for hydrogen projectiles, showing an excellent agreement with experiment. These first-principles calculations allow us to gain important insight into the the fundamental physics of electronic stopping.
CSP: A Multifaceted Hybrid Architecture for Space Computing
NASA Technical Reports Server (NTRS)
Rudolph, Dylan; Wilson, Christopher; Stewart, Jacob; Gauvin, Patrick; George, Alan; Lam, Herman; Crum, Gary Alex; Wirthlin, Mike; Wilson, Alex; Stoddard, Aaron
2014-01-01
Research on the CHREC Space Processor (CSP) takes a multifaceted hybrid approach to embedded space computing. Working closely with the NASA Goddard SpaceCube team, researchers at the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida and Brigham Young University are developing hybrid space computers that feature an innovative combination of three technologies: commercial-off-the-shelf (COTS) devices, radiation-hardened (RadHard) devices, and fault-tolerant computing. Modern COTS processors provide the utmost in performance and energy-efficiency but are susceptible to ionizing radiation in space, whereas RadHard processors are virtually immune to this radiation but are more expensive, larger, less energy-efficient, and generations behind in speed and functionality. By featuring COTS devices to perform the critical data processing, supported by simpler RadHard devices that monitor and manage the COTS devices, and augmented with novel uses of fault-tolerant hardware, software, information, and networking within and between COTS devices, the resulting system can maximize performance and reliability while minimizing energy consumption and cost. NASA Goddard has adopted the CSP concept and technology with plans underway to feature flight-ready CSP boards on two upcoming space missions.
QMachine: commodity supercomputing in web browsers.
Wilkinson, Sean R; Almeida, Jonas S
2014-06-09
Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
NASA Astrophysics Data System (ADS)
Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.
2017-06-01
Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Supercomputing resources empowering superstack with interactive and integrated systems
NASA Astrophysics Data System (ADS)
Rückemann, Claus-Peter
2012-09-01
This paper presents the results from the development and implementation of Superstack algorithms to be dynamically used with integrated systems and supercomputing resources. Processing of geophysical data, thus named geoprocessing, is an essential part of the analysis of geoscientific data. The theory of Superstack algorithms and the practical application on modern computing architectures was inspired by developments introduced with processing of seismic data on mainframes and within the last years leading to high end scientific computing applications. There are several stacking algorithms known but with low signal to noise ratio in seismic data the use of iterative algorithms like the Superstack can support analysis and interpretation. The new Superstack algorithms are in use with wave theory and optical phenomena on highly performant computing resources for huge data sets as well as for sophisticated application scenarios in geosciences and archaeology.
Modern Electronic Devices: An Increasingly Common Cause of Skin Disorders in Consumers.
Corazza, Monica; Minghetti, Sara; Bertoldi, Alberto Maria; Martina, Emanuela; Virgili, Annarosa; Borghi, Alessandro
2016-01-01
: The modern conveniences and enjoyment brought about by electronic devices bring with them some health concerns. In particular, personal electronic devices are responsible for rising cases of several skin disorders, including pressure, friction, contact dermatitis, and other physical dermatitis. The universal use of such devices, either for work or recreational purposes, will probably increase the occurrence of polymorphous skin manifestations over time. It is important for clinicians to consider electronics as potential sources of dermatological ailments, for proper patient management. We performed a literature review on skin disorders associated with the personal use of modern technology, including personal computers and laptops, personal computer accessories, mobile phones, tablets, video games, and consoles.
Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.
Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan
2015-01-01
Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450
Development of seismic tomography software for hybrid supercomputers
NASA Astrophysics Data System (ADS)
Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton
2015-04-01
Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
NASA Technical Reports Server (NTRS)
Chen, R. T. N.; Hindson, W. S.
1985-01-01
The increasing use of highly augmented digital flight-control systems in modern military helicopters prompted an examination of the influence of rotor dynamics and other high-order dynamics on control-system performance. A study was conducted at NASA Ames Research Center to correlate theoretical predictions of feedback gain limits in the roll axis with experimental test data obtained from a variable-stability research helicopter. Feedback gains, the break frequency of the presampling sensor filter, and the computational frame time of the flight computer were systematically varied. The results, which showed excellent theoretical and experimental correlation, indicate that the rotor-dynamics, sensor-filter, and digital-data processing delays can severely limit the usable values of the roll-rate and roll-attitude feedback gains.
Aerodynamic optimization of supersonic compressor cascade using differential evolution on GPU
NASA Astrophysics Data System (ADS)
Aissa, Mohamed Hasanine; Verstraete, Tom; Vuik, Cornelis
2016-06-01
Differential Evolution (DE) is a powerful stochastic optimization method. Compared to gradient-based algorithms, DE is able to avoid local minima but requires at the same time more function evaluations. In turbomachinery applications, function evaluations are performed with time-consuming CFD simulation, which results in a long, non affordable, design cycle. Modern High Performance Computing systems, especially Graphic Processing Units (GPUs), are able to alleviate this inconvenience by accelerating the design evaluation itself. In this work we present a validated CFD Solver running on GPUs, able to accelerate the design evaluation and thus the entire design process. An achieved speedup of 20x to 30x enabled the DE algorithm to run on a high-end computer instead of a costly large cluster. The GPU-enhanced DE was used to optimize the aerodynamics of a supersonic compressor cascade, achieving an aerodynamic loss minimization of 20%.
Aerodynamic optimization of supersonic compressor cascade using differential evolution on GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aissa, Mohamed Hasanine; Verstraete, Tom; Vuik, Cornelis
Differential Evolution (DE) is a powerful stochastic optimization method. Compared to gradient-based algorithms, DE is able to avoid local minima but requires at the same time more function evaluations. In turbomachinery applications, function evaluations are performed with time-consuming CFD simulation, which results in a long, non affordable, design cycle. Modern High Performance Computing systems, especially Graphic Processing Units (GPUs), are able to alleviate this inconvenience by accelerating the design evaluation itself. In this work we present a validated CFD Solver running on GPUs, able to accelerate the design evaluation and thus the entire design process. An achieved speedup of 20xmore » to 30x enabled the DE algorithm to run on a high-end computer instead of a costly large cluster. The GPU-enhanced DE was used to optimize the aerodynamics of a supersonic compressor cascade, achieving an aerodynamic loss minimization of 20%.« less
DoD High Performance Computing Modernization Program FY16 Annual Report
2018-05-02
vortex shedding from rotor blade tips using adaptive mesh refinement gives Helios the unique capability to assess the interaction of these vortices...with the fuselage and nearby rotor blades . Helios provides all the benefits for rotary-winged aircraft that Kestrel does for fixed-wing aircraft...rotor blade upgrade of the CH-47F Chinook helicopter to achieve up to an estimated 2,000 pounds increase in hover thrust (~10%) with limited
2009 High Performance Computing Modernization Program Users Group Conference
2009-06-17
Asymmetric Threats Future Peer GWoT / ungoverned areas Irregular Warfare Low-end Asymmetric 1-4-2-1 (State-to-State War) Disruptive technologies Superiority...2008 “As changes in this century’s threat environment create strategic challenges – irregular warfare, weapons of mass destruction, disruptive ... technologies – this request places greater emphasis on basic research, which in recent years has not kept pace with other parts of the budget.” • Personnel
Goal Orientation Framing and Its Influence on Performance
2012-12-01
first-person shooter computer games Call of Duty: Modern Warfare 2 and Call of Duty: Modern Warfare 3. During the simulation, participants were...working understanding of social expectations and norms (Duda & Nicholis, 1992). It is true that obese people, drug addicts and abusive parents exist in...Monterey Student Activity Center. B. MEASURES Performance was assessed in two tests, a math test and a first-person shooter game . It was the intent of
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.
2011-01-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W
2012-06-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Student Perceived Importance and Correlations of Selected Computer Literacy Course Topics
ERIC Educational Resources Information Center
Ciampa, Mark
2013-01-01
Traditional college-level courses designed to teach computer literacy are in a state of flux. Today's students have high rates of access to computing technology and computer ownership, leading many policy decision makers to conclude that students already are computer literate and thus computer literacy courses are dinosaurs in a modern digital…
Electron tubes for industrial applications
NASA Astrophysics Data System (ADS)
Gellert, Bernd
1994-05-01
This report reviews research and development efforts within the last years for vacuum electron tubes, in particular power grid tubes for industrial applications. Physical and chemical effects are discussed that determine the performance of todays devices. Due to the progress made in the fundamental understanding of materials and newly developed processes the reliability and reproducibility of power grid tubes could be improved considerably. Modern computer controlled manufacturing methods ensure a high reproducibility of production and continuous quality certification according to ISO 9001 guarantees future high quality standards. Some typical applications of these tubes are given as an example.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing
Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng
2015-01-01
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA’s CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels. PMID:26566545
Extreme-Scale De Novo Genome Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Georganas, Evangelos; Hofmeyr, Steven; Egan, Rob
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and themore » large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.« less
Teodoro, George; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel
2014-01-01
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). PMID:25419088
Fast Photon Monte Carlo for Water Cherenkov Detectors
NASA Astrophysics Data System (ADS)
Latorre, Anthony; Seibert, Stanley
2012-03-01
We present Chroma, a high performance optical photon simulation for large particle physics detectors, such as the water Cerenkov far detector option for LBNE. This software takes advantage of the CUDA parallel computing platform to propagate photons using modern graphics processing units. In a computer model of a 200 kiloton water Cerenkov detector with 29,000 photomultiplier tubes, Chroma can propagate 2.5 million photons per second, around 200 times faster than the same simulation with Geant4. Chroma uses a surface based approach to modeling geometry which offers many benefits over a solid based modelling approach which is used in other simulations like Geant4.
SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes
NASA Astrophysics Data System (ADS)
Homann, Holger; Laenen, Francois
2018-03-01
The numerical study of physical problems often require integrating the dynamics of a large number of particles evolving according to a given set of equations. Particles are characterized by the information they are carrying such as an identity, a position other. There are generally speaking two different possibilities for handling particles in high performance computing (HPC) codes. The concept of an Array of Structures (AoS) is in the spirit of the object-oriented programming (OOP) paradigm in that the particle information is implemented as a structure. Here, an object (realization of the structure) represents one particle and a set of many particles is stored in an array. In contrast, using the concept of a Structure of Arrays (SoA), a single structure holds several arrays each representing one property (such as the identity) of the whole set of particles. The AoS approach is often implemented in HPC codes due to its handiness and flexibility. For a class of problems, however, it is known that the performance of SoA is much better than that of AoS. We confirm this observation for our particle problem. Using a benchmark we show that on modern Intel Xeon processors the SoA implementation is typically several times faster than the AoS one. On Intel's MIC co-processors the performance gap even attains a factor of ten. The same is true for GPU computing, using both computational and multi-purpose GPUs. Combining performance and handiness, we present the library SoAx that has optimal performance (on CPUs, MICs, and GPUs) while providing the same handiness as AoS. For this, SoAx uses modern C++ design techniques such template meta programming that allows to automatically generate code for user defined heterogeneous data structures.
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing
NASA Astrophysics Data System (ADS)
Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen
2018-04-01
The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
High Performance Computing Modernization Program Kerberos Throughput Test Report
2017-10-26
functionality as Kerberos plugins. The pre -release production kit was used in these tests to compare against the current release kit. YubiKey support...HPCMP Kerberos Throughput Test Report 3 2. THROUGHPUT TESTING 2.1 Testing Components Throughput testing was done to determine the benefits of the pre ...both the current release kit and the pre -release production kit for a total of 378 individual tests in order to note any improvements. Based on work
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Evaluation of a Multicore-Optimized Implementation for Tomographic Reconstruction
Agulleiro, Jose-Ignacio; Fernández, José Jesús
2012-01-01
Tomography allows elucidation of the three-dimensional structure of an object from a set of projection images. In life sciences, electron microscope tomography is providing invaluable information about the cell structure at a resolution of a few nanometres. Here, large images are required to combine wide fields of view with high resolution requirements. The computational complexity of the algorithms along with the large image size then turns tomographic reconstruction into a computationally demanding problem. Traditionally, high-performance computing techniques have been applied to cope with such demands on supercomputers, distributed systems and computer clusters. In the last few years, the trend has turned towards graphics processing units (GPUs). Here we present a detailed description and a thorough evaluation of an alternative approach that relies on exploitation of the power available in modern multicore computers. The combination of single-core code optimization, vector processing, multithreading and efficient disk I/O operations succeeds in providing fast tomographic reconstructions on standard computers. The approach turns out to be competitive with the fastest GPU-based solutions thus far. PMID:23139768
Real-time implementation of an interactive jazz accompaniment system
NASA Astrophysics Data System (ADS)
Deshpande, Nikhil
Modern computational algorithms and digital signal processing (DSP) are able to combine with human performers without forced or predetermined structure in order to create dynamic and real-time accompaniment systems. With modern computing power and intelligent algorithm layout and design, it is possible to achieve more detailed auditory analysis of live music. Using this information, computer code can follow and predict how a human's musical performance evolves, and use this to react in a musical manner. This project builds a real-time accompaniment system to perform together with live musicians, with a focus on live jazz performance and improvisation. The system utilizes a new polyphonic pitch detector and embeds it in an Ableton Live system - combined with Max for Live - to perform elements of audio analysis, generation, and triggering. The system also relies on tension curves and information rate calculations from the Creative Artificially Intuitive and Reasoning Agent (CAIRA) system to help understand and predict human improvisation. These metrics are vital to the core system and allow for extrapolated audio analysis. The system is able to react dynamically to a human performer, and can successfully accompany the human as an entire rhythm section.
QMachine: commodity supercomputing in web browsers
2014-01-01
Background Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics’ “Big Data” from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. Results QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running “download and install” software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. Conclusions QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments. PMID:24913605
NASA Technical Reports Server (NTRS)
Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.
1985-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
A view of Kanerva's sparse distributed memory
NASA Technical Reports Server (NTRS)
Denning, P. J.
1986-01-01
Pentti Kanerva is working on a new class of computers, which are called pattern computers. Pattern computers may close the gap between capabilities of biological organisms to recognize and act on patterns (visual, auditory, tactile, or olfactory) and capabilities of modern computers. Combinations of numeric, symbolic, and pattern computers may one day be capable of sustaining robots. The overview of the requirements for a pattern computer, a summary of Kanerva's Sparse Distributed Memory (SDM), and examples of tasks this computer can be expected to perform well are given.
New developments in supra-threshold perimetry.
Henson, David B; Artes, Paul H
2002-09-01
To describe a series of recent enhancements to supra-threshold perimetry. Computer simulations were used to develop an improved algorithm (HEART) for the setting of the supra-threshold test intensity at the beginning of a field test, and to evaluate the relationship between various pass/fail criteria and the test's performance (sensitivity and specificity) and how they compare with modern threshold perimetry. Data were collected in optometric practices to evaluate HEART and to assess how the patient's response times can be analysed to detect false positive response errors in visual field test results. The HEART algorithm shows improved performance (reduced between-eye differences) over current algorithms. A pass/fail criterion of '3 stimuli seen of 3-5 presentations' at each test location reduces test/retest variability and combines high sensitivity and specificity. A large percentage of false positive responses can be detected by comparing their latencies to the average response time of a patient. Optimised supra-threshold visual field tests can perform as well as modern threshold techniques. Such tests may be easier to perform for novice patients, compared with the more demanding threshold tests.
Enabling a high throughput real time data pipeline for a large radio telescope array with GPUs
NASA Astrophysics Data System (ADS)
Edgar, R. G.; Clark, M. A.; Dale, K.; Mitchell, D. A.; Ord, S. M.; Wayth, R. B.; Pfister, H.; Greenhill, L. J.
2010-10-01
The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. Raw data will be generated continuously at 5 GiB s-1, grouped into 8 s cadences. This high throughput motivates the development of on-site, real time processing and reduction in preference to archiving, transport and off-line processing. Each batch of 8 s data must be completely reduced before the next batch arrives. Maintaining real time operation will require a sustained performance of around 2.5 TFLOP s-1 (including convolutions, FFTs, interpolations and matrix multiplications). We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-Watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized between the nodes hosting the GPUs. The MWA will be a frontier scientific instrument and a pathfinder for planned peta- and exa-scale facilities.
ERIC Educational Resources Information Center
Esselman, Brian J.; Hill, Nicholas J.
2016-01-01
Advances in software and hardware have promoted the use of computational chemistry in all branches of chemical research to probe important chemical concepts and to support experimentation. Consequently, it has become imperative that students in the modern undergraduate curriculum become adept at performing simple calculations using computational…
A performance model for GPUs with caches
Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...
2014-06-24
To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
Many-core graph analytics using accelerated sparse linear algebra routines
NASA Astrophysics Data System (ADS)
Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric
2016-05-01
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Adeshina, A M; Hashim, R
2017-03-01
Diagnostic radiology is a core and integral part of modern medicine, paving ways for the primary care physicians in the disease diagnoses, treatments and therapy managements. Obviously, all recent standard healthcare procedures have immensely benefitted from the contemporary information technology revolutions, apparently revolutionizing those approaches to acquiring, storing and sharing of diagnostic data for efficient and timely diagnosis of diseases. Connected health network was introduced as an alternative to the ageing traditional concept in healthcare system, improving hospital-physician connectivity and clinical collaborations. Undoubtedly, the modern medicinal approach has drastically improved healthcare but at the expense of high computational cost and possible breach of diagnosis privacy. Consequently, a number of cryptographical techniques are recently being applied to clinical applications, but the challenges of not being able to successfully encrypt both the image and the textual data persist. Furthermore, processing time of encryption-decryption of medical datasets, within a considerable lower computational cost without jeopardizing the required security strength of the encryption algorithm, still remains as an outstanding issue. This study proposes a secured radiology-diagnostic data framework for connected health network using high-performance GPU-accelerated Advanced Encryption Standard. The study was evaluated with radiology image datasets consisting of brain MR and CT datasets obtained from the department of Surgery, University of North Carolina, USA, and the Swedish National Infrastructure for Computing. Sample patients' notes from the University of North Carolina, School of medicine at Chapel Hill were also used to evaluate the framework for its strength in encrypting-decrypting textual data in the form of medical report. Significantly, the framework is not only able to accurately encrypt and decrypt medical image datasets, but it also successfully encrypts and decrypts textual data in Microsoft Word document, Microsoft Excel and Portable Document Formats which are the conventional format of documenting medical records. Interestingly, the entire encryption and decryption procedures were achieved at a lower computational cost using regular hardware and software resources without compromising neither the quality of the decrypted data nor the security level of the algorithms.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Virtopsy: postmortem imaging of laryngeal foreign bodies.
Oesterhelweg, Lars; Bolliger, Stephan A; Thali, Michael J; Ross, Steffen
2009-05-01
Death from corpora aliena in the larynx is a well-known entity in forensic pathology. The correct diagnosis of this cause of death is difficult without an autopsy, and misdiagnoses by external examination alone are common. To determine the postmortem usefulness of modern imaging techniques in the diagnosis of foreign bodies in the larynx, multislice computed tomography, magnetic resonance imaging, and postmortem full-body computed tomography-angiography were performed. Three decedents with a suspected foreign body in the larynx underwent the 3 different imaging techniques before medicolegal autopsy. Multislice computed tomography has a high diagnostic value in the noninvasive localization of a foreign body and abnormalities in the larynx. The differentiation between neoplasm or soft foreign bodies (eg, food) is possible, but difficult, by unenhanced multislice computed tomography. By magnetic resonance imaging, the discrimination of the soft tissue structures and soft foreign bodies is much easier. In addition to the postmortem multislice computed tomography, the combination with postmortem angiography will increase the diagnostic value. Postmortem, cross-sectional imaging methods are highly valuable procedures for the noninvasive detection of corpora aliena in the larynx.
NASA Technical Reports Server (NTRS)
Garai, Anirban; Diosady, Laslo T.; Murman, Scott M.; Madavan, Nateri K.
2016-01-01
Recent progress towards developing a new computational capability for accurate and efficient high-fidelity direct numerical simulation (DNS) and large-eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy- stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy, and is implemented in a computationally efficient manner on a modern high performance computer architecture. An inflow turbulence generation procedure based on a linear forcing approach has been incorporated in this framework and DNS conducted to study the effect of inflow turbulence on the suction- side separation bubble in low-pressure turbine (LPT) cascades. The T106 series of airfoil cascades in both lightly (T106A) and highly loaded (T106C) configurations at exit isentropic Reynolds numbers of 60,000 and 80,000, respectively, are considered. The numerical simulations are performed using 8th-order accurate spatial and 4th-order accurate temporal discretization. The changes in separation bubble topology due to elevated inflow turbulence is captured by the present method and the physical mechanisms leading to the changes are explained. The present results are in good agreement with prior numerical simulations but some expected discrepancies with the experimental data for the T106C case are noted and discussed.
KAGLVis - On-line 3D Visualisation of Earth-observing-satellite Data
NASA Astrophysics Data System (ADS)
Szuba, Marek; Ameri, Parinaz; Grabowski, Udo; Maatouki, Ahmad; Meyer, Jörg
2015-04-01
One of the goals of the Large-Scale Data Management and Analysis project is to provide a high-performance framework facilitating management of data acquired by Earth-observing satellites such as Envisat. On the client-facing facet of this framework, we strive to provide visualisation and basic analysis tool which could be used by scientists with minimal to no knowledge of the underlying infrastructure. Our tool, KAGLVis, is a JavaScript client-server Web application which leverages modern Web technologies to provide three-dimensional visualisation of satellite observables on a wide range of client systems. It takes advantage of the WebGL API to employ locally available GPU power for 3D rendering; this approach has been demonstrated to perform well even on relatively weak hardware such as integrated graphics chipsets found in modern laptop computers and with some user-interface tuning could even be usable on embedded devices such as smartphones or tablets. Data is fetched from the database back-end using a ReST API and cached locally, both in memory and using HTML5 Web Storage, to minimise network use. Computations, calculation of cloud altitude from cloud-index measurements for instance, can depending on configuration be performed on either the client or the server side. Keywords: satellite data, Envisat, visualisation, 3D graphics, Web application, WebGL, MEAN stack.
NASA Astrophysics Data System (ADS)
Zeitler, T.; Kirchner, T. B.; Hammond, G. E.; Park, H.
2014-12-01
The Waste Isolation Pilot Plant (WIPP) has been developed by the U.S. Department of Energy (DOE) for the geologic (deep underground) disposal of transuranic (TRU) waste. Containment of TRU waste at the WIPP is regulated by the U.S. Environmental Protection Agency (EPA). The DOE demonstrates compliance with the containment requirements by means of performance assessment (PA) calculations. WIPP PA calculations estimate the probability and consequence of potential radionuclide releases from the repository to the accessible environment for a regulatory period of 10,000 years after facility closure. The long-term performance of the repository is assessed using a suite of sophisticated computational codes. In a broad modernization effort, the DOE has overseen the transfer of these codes to modern hardware and software platforms. Additionally, there is a current effort to establish new performance assessment capabilities through the further development of the PFLOTRAN software, a state-of-the-art massively parallel subsurface flow and reactive transport code. Improvements to the current computational environment will result in greater detail in the final models due to the parallelization afforded by the modern code. Parallelization will allow for relatively faster calculations, as well as a move from a two-dimensional calculation grid to a three-dimensional grid. The result of the modernization effort will be a state-of-the-art subsurface flow and transport capability that will serve WIPP PA into the future. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. This research is funded by WIPP programs administered by the Office of Environmental Management (EM) of the U.S Department of Energy.
High Performance Radiation Transport Simulations on TITAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Christopher G; Davidson, Gregory G; Evans, Thomas M
2012-01-01
In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for inverting the transport operator, a new multilevel energy decomposition method scaling to hundreds of thousands of processing cores, and a modern, novel code architecture that supports straightforward integration of new features. In this paper we discuss the performance of Denovo on the 10--20 petaflop ORNLmore » GPU-based system, Titan. We describe algorithms and techniques used to exploit the capabilities of Titan's heterogeneous compute node architecture and the challenges of obtaining good parallel performance for this sparse hyperbolic PDE solver containing inherently sequential computations. Numerical results demonstrating Denovo performance on early Titan hardware are presented.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Y M; Bush, K; Han, B
Purpose: Accurate and fast dose calculation is a prerequisite of precision radiation therapy in modern photon and particle therapy. While Monte Carlo (MC) dose calculation provides high dosimetric accuracy, the drastically increased computational time hinders its routine use. Deterministic dose calculation methods are fast, but problematic in the presence of tissue density inhomogeneity. We leverage the useful features of deterministic methods and MC to develop a hybrid dose calculation platform with autonomous utilization of MC and deterministic calculation depending on the local geometry, for optimal accuracy and speed. Methods: Our platform utilizes a Geant4 based “localized Monte Carlo” (LMC) methodmore » that isolates MC dose calculations only to volumes that have potential for dosimetric inaccuracy. In our approach, additional structures are created encompassing heterogeneous volumes. Deterministic methods calculate dose and energy fluence up to the volume surfaces, where the energy fluence distribution is sampled into discrete histories and transported using MC. Histories exiting the volume are converted back into energy fluence, and transported deterministically. By matching boundary conditions at both interfaces, deterministic dose calculation account for dose perturbations “downstream” of localized heterogeneities. Hybrid dose calculation was performed for water and anthropomorphic phantoms. Results: We achieved <1% agreement between deterministic and MC calculations in the water benchmark for photon and proton beams, and dose differences of 2%–15% could be observed in heterogeneous phantoms. The saving in computational time (a factor ∼4–7 compared to a full Monte Carlo dose calculation) was found to be approximately proportional to the volume of the heterogeneous region. Conclusion: Our hybrid dose calculation approach takes advantage of the computational efficiency of deterministic method and accuracy of MC, providing a practical tool for high performance dose calculation in modern RT. The approach is generalizable to all modalities where heterogeneities play a large role, notably particle therapy.« less
A high-speed DAQ framework for future high-level trigger and event building clusters
NASA Astrophysics Data System (ADS)
Caselle, M.; Ardila Perez, L. E.; Balzer, M.; Dritschler, T.; Kopmann, A.; Mohr, H.; Rota, L.; Vogelgesang, M.; Weber, M.
2017-03-01
Modern data acquisition and trigger systems require a throughput of several GB/s and latencies of the order of microseconds. To satisfy such requirements, a heterogeneous readout system based on FPGA readout cards and GPU-based computing nodes coupled by InfiniBand has been developed. The incoming data from the back-end electronics is delivered directly into the internal memory of GPUs through a dedicated peer-to-peer PCIe communication. High performance DMA engines have been developed for direct communication between FPGAs and GPUs using "DirectGMA (AMD)" and "GPUDirect (NVIDIA)" technologies. The proposed infrastructure is a candidate for future generations of event building clusters, high-level trigger filter farms and low-level trigger system. In this paper the heterogeneous FPGA-GPU architecture will be presented and its performance be discussed.
NASA Technical Reports Server (NTRS)
Chen, Shu-cheng, S.
2009-01-01
For the preliminary design and the off-design performance analysis of axial flow turbines, a pair of intermediate level-of-fidelity computer codes, TD2-2 (design; reference 1) and AXOD (off-design; reference 2), are being evaluated for use in turbine design and performance prediction of the modern high performance aircraft engines. TD2-2 employs a streamline curvature method for design, while AXOD approaches the flow analysis with an equal radius-height domain decomposition strategy. Both methods resolve only the flows in the annulus region while modeling the impact introduced by the blade rows. The mathematical formulations and derivations involved in both methods are documented in references 3, 4 for TD2-2) and in reference 5 (for AXOD). The focus of this paper is to discuss the fundamental issues of applicability and compatibility of the two codes as a pair of companion pieces, to perform preliminary design and off-design analysis for modern aircraft engine turbines. Two validation cases for the design and the off-design prediction using TD2-2 and AXOD conducted on two existing high efficiency turbines, developed and tested in the NASA/GE Energy Efficient Engine (GE-E3) Program, the High Pressure Turbine (HPT; two stages, air cooled) and the Low Pressure Turbine (LPT; five stages, un-cooled), are provided in support of the analysis and discussion presented in this paper.
Multicore Architecture-aware Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinivasa, Avinash
Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a largemore » scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times.« less
NASA Astrophysics Data System (ADS)
Pei, Zongrui; Eisenbach, Markus
2017-06-01
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), the local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.
YASS: A System Simulator for Operating System and Computer Architecture Teaching and Learning
ERIC Educational Resources Information Center
Mustafa, Besim
2013-01-01
A highly interactive, integrated and multi-level simulator has been developed specifically to support both the teachers and the learners of modern computer technologies at undergraduate level. The simulator provides a highly visual and user configurable environment with many pedagogical features aimed at facilitating deep understanding of concepts…
Application of Game Theory to Improve the Defense of the Smart Grid
2012-03-01
Computer Systems and Networks ...............................................22 2.4.2 Trust Models ...systems. In this environment, developers assumed deterministic communications mediums rather than the “best effort” models provided in most modern... models or computational models to validate the SPSs design. Finally, the study reveals concerns about the performance of load rejection schemes
cudaMap: a GPU accelerated program for gene expression connectivity mapping.
McArt, Darragh G; Bankhead, Peter; Dunne, Philip D; Salto-Tellez, Manuel; Hamilton, Peter; Zhang, Shu-Dong
2013-10-11
Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
On the Impact of Execution Models: A Case Study in Computational Chemistry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Halappanavar, Mahantesh; Krishnamoorthy, Sriram
2015-05-25
Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancingmore » that has comparable performance to a traditional hypergraph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.« less
QCDLoop: A comprehensive framework for one-loop scalar integrals
NASA Astrophysics Data System (ADS)
Carrazza, Stefano; Ellis, R. Keith; Zanderighi, Giulia
2016-12-01
We present a new release of the QCDLoop library based on a modern object-oriented framework. We discuss the available new features such as the extension to the complex masses, the possibility to perform computations in double and quadruple precision simultaneously, and useful caching mechanisms to improve the computational speed. We benchmark the performance of the new library, and provide practical examples of phenomenological implementations by interfacing this new library to Monte Carlo programs.
VLSI Implementation of Fault Tolerance Multiplier based on Reversible Logic Gate
NASA Astrophysics Data System (ADS)
Ahmad, Nabihah; Hakimi Mokhtar, Ahmad; Othman, Nurmiza binti; Fhong Soon, Chin; Rahman, Ab Al Hadi Ab
2017-08-01
Multiplier is one of the essential component in the digital world such as in digital signal processing, microprocessor, quantum computing and widely used in arithmetic unit. Due to the complexity of the multiplier, tendency of errors are very high. This paper aimed to design a 2×2 bit Fault Tolerance Multiplier based on Reversible logic gate with low power consumption and high performance. This design have been implemented using 90nm Complemetary Metal Oxide Semiconductor (CMOS) technology in Synopsys Electronic Design Automation (EDA) Tools. Implementation of the multiplier architecture is by using the reversible logic gates. The fault tolerance multiplier used the combination of three reversible logic gate which are Double Feynman gate (F2G), New Fault Tolerance (NFT) gate and Islam Gate (IG) with the area of 160μm x 420.3μm (67.25 mm2). This design achieved a low power consumption of 122.85μW and propagation delay of 16.99ns. The fault tolerance multiplier proposed achieved a low power consumption and high performance which suitable for application of modern computing as it has a fault tolerance capabilities.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
A full 3D-navigation system in a suitcase.
Freysinger, W; Truppe, M J; Gunkel, A R; Thumfart, W F
2001-01-01
To reduce the impact of contemporary 3D-navigation systems on the environment of typical otorhinolaryngologic operating rooms, we demonstrate that a transfer of navigation software to modern high-power notebook computers is feasible and results in a practicable way to provide positional information to a surgeon intraoperatively. The ARTMA Virtual Patient System has been implemented on a Macintosh PowerBook G3 and, in connection with the Polhemus FASTRAK digitizer, provides intraoperative positional information during endoscopic endonasal surgery. Satisfactory intraoperative navigation has been realized in two- and three-dimensional medical image data sets (i.e., X-ray, ultrasound images, CT, and MR) and live video. This proof-of-concept study demonstrates that acceptable ergonomics and excellent performance of the system can be achieved with contemporary high-end notebook computers. Copyright 2001 Wiley-Liss, Inc.
Rosenthal, L E
1986-10-01
Software is the component in a computer system that permits the hardware to perform the various functions that a computer system is capable of doing. The history of software and its development can be traced to the early nineteenth century. All computer systems are designed to utilize the "stored program concept" as first developed by Charles Babbage in the 1850s. The concept was lost until the mid-1940s, when modern computers made their appearance. Today, because of the complex and myriad tasks that a computer system can perform, there has been a differentiation of types of software. There is software designed to perform specific business applications. There is software that controls the overall operation of a computer system. And there is software that is designed to carry out specialized tasks. Regardless of types, software is the most critical component of any computer system. Without it, all one has is a collection of circuits, transistors, and silicone chips.
The efficiency of geophysical adjoint codes generated by automatic differentiation tools
NASA Astrophysics Data System (ADS)
Vlasenko, A. V.; Köhl, A.; Stammer, D.
2016-02-01
The accuracy of numerical models that describe complex physical or chemical processes depends on the choice of model parameters. Estimating an optimal set of parameters by optimization algorithms requires knowledge of the sensitivity of the process of interest to model parameters. Typically the sensitivity computation involves differentiation of the model, which can be performed by applying algorithmic differentiation (AD) tools to the underlying numerical code. However, existing AD tools differ substantially in design, legibility and computational efficiency. In this study we show that, for geophysical data assimilation problems of varying complexity, the performance of adjoint codes generated by the existing AD tools (i) Open_AD, (ii) Tapenade, (iii) NAGWare and (iv) Transformation of Algorithms in Fortran (TAF) can be vastly different. Based on simple test problems, we evaluate the efficiency of each AD tool with respect to computational speed, accuracy of the adjoint, the efficiency of memory usage, and the capability of each AD tool to handle modern FORTRAN 90-95 elements such as structures and pointers, which are new elements that either combine groups of variables or provide aliases to memory addresses, respectively. We show that, while operator overloading tools are the only ones suitable for modern codes written in object-oriented programming languages, their computational efficiency lags behind source transformation by orders of magnitude, rendering the application of these modern tools to practical assimilation problems prohibitive. In contrast, the application of source transformation tools appears to be the most efficient choice, allowing handling even large geophysical data assimilation problems. However, they can only be applied to numerical models written in earlier generations of programming languages. Our study indicates that applying existing AD tools to realistic geophysical problems faces limitations that urgently need to be solved to allow the continuous use of AD tools for solving geophysical problems on modern computer architectures.
Fast neuromimetic object recognition using FPGA outperforms GPU implementations.
Orchard, Garrick; Martin, Jacob G; Vogelstein, R Jacob; Etienne-Cummings, Ralph
2013-08-01
Recognition of objects in still images has traditionally been regarded as a difficult computational problem. Although modern automated methods for visual object recognition have achieved steadily increasing recognition accuracy, even the most advanced computational vision approaches are unable to obtain performance equal to that of humans. This has led to the creation of many biologically inspired models of visual object recognition, among them the hierarchical model and X (HMAX) model. HMAX is traditionally known to achieve high accuracy in visual object recognition tasks at the expense of significant computational complexity. Increasing complexity, in turn, increases computation time, reducing the number of images that can be processed per unit time. In this paper we describe how the computationally intensive and biologically inspired HMAX model for visual object recognition can be modified for implementation on a commercial field-programmable aate Array, specifically the Xilinx Virtex 6 ML605 evaluation board with XC6VLX240T FPGA. We show that with minor modifications to the traditional HMAX model we can perform recognition on images of size 128 × 128 pixels at a rate of 190 images per second with a less than 1% loss in recognition accuracy in both binary and multiclass visual object recognition tasks.
Wronkiewicz, Mark; Larson, Eric; Lee, Adrian Kc
2016-10-01
Brain-computer interface (BCI) technology allows users to generate actions based solely on their brain signals. However, current non-invasive BCIs generally classify brain activity recorded from surface electroencephalography (EEG) electrodes, which can hinder the application of findings from modern neuroscience research. In this study, we use source imaging-a neuroimaging technique that projects EEG signals onto the surface of the brain-in a BCI classification framework. This allowed us to incorporate prior research from functional neuroimaging to target activity from a cortical region involved in auditory attention. Classifiers trained to detect attention switches performed better with source imaging projections than with EEG sensor signals. Within source imaging, including subject-specific anatomical MRI information (instead of using a generic head model) further improved classification performance. This source-based strategy also reduced accuracy variability across three dimensionality reduction techniques-a major design choice in most BCIs. Our work shows that source imaging provides clear quantitative and qualitative advantages to BCIs and highlights the value of incorporating modern neuroscience knowledge and methods into BCI systems.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Petascale computation of multi-physics seismic simulations
NASA Astrophysics Data System (ADS)
Gabriel, Alice-Agnes; Madden, Elizabeth H.; Ulrich, Thomas; Wollherr, Stephanie; Duru, Kenneth C.
2017-04-01
Capturing the observed complexity of earthquake sources in concurrence with seismic wave propagation simulations is an inherently multi-scale, multi-physics problem. In this presentation, we present simulations of earthquake scenarios resolving high-detail dynamic rupture evolution and high frequency ground motion. The simulations combine a multitude of representations of model complexity; such as non-linear fault friction, thermal and fluid effects, heterogeneous fault stress and fault strength initial conditions, fault curvature and roughness, on- and off-fault non-elastic failure to capture dynamic rupture behavior at the source; and seismic wave attenuation, 3D subsurface structure and bathymetry impacting seismic wave propagation. Performing such scenarios at the necessary spatio-temporal resolution requires highly optimized and massively parallel simulation tools which can efficiently exploit HPC facilities. Our up to multi-PetaFLOP simulations are performed with SeisSol (www.seissol.org), an open-source software package based on an ADER-Discontinuous Galerkin (DG) scheme solving the seismic wave equations in velocity-stress formulation in elastic, viscoelastic, and viscoplastic media with high-order accuracy in time and space. Our flux-based implementation of frictional failure remains free of spurious oscillations. Tetrahedral unstructured meshes allow for complicated model geometry. SeisSol has been optimized on all software levels, including: assembler-level DG kernels which obtain 50% peak performance on some of the largest supercomputers worldwide; an overlapping MPI-OpenMP parallelization shadowing the multiphysics computations; usage of local time stepping; parallel input and output schemes and direct interfaces to community standard data formats. All these factors enable aim to minimise the time-to-solution. The results presented highlight the fact that modern numerical methods and hardware-aware optimization for modern supercomputers are essential to further our understanding of earthquake source physics and complement both physic-based ground motion research and empirical approaches in seismic hazard analysis. Lastly, we will conclude with an outlook on future exascale ADER-DG solvers for seismological applications.
Cosmological neutrino simulations at extreme scale
Emberson, J. D.; Yu, Hao-Ran; Inman, Derek; ...
2017-08-01
Constraining neutrino mass remains an elusive challenge in modern physics. Precision measurements are expected from several upcoming cosmological probes of large-scale structure. Achieving this goal relies on an equal level of precision from theoretical predictions of neutrino clustering. Numerical simulations of the non-linear evolution of cold dark matter and neutrinos play a pivotal role in this process. We incorporate neutrinos into the cosmological N-body code CUBEP3M and discuss the challenges associated with pushing to the extreme scales demanded by the neutrino problem. We highlight code optimizations made to exploit modern high performance computing architectures and present a novel method ofmore » data compression that reduces the phase-space particle footprint from 24 bytes in single precision to roughly 9 bytes. We scale the neutrino problem to the Tianhe-2 supercomputer and provide details of our production run, named TianNu, which uses 86% of the machine (13,824 compute nodes). With a total of 2.97 trillion particles, TianNu is currently the world’s largest cosmological N-body simulation and improves upon previous neutrino simulations by two orders of magnitude in scale. We finish with a discussion of the unanticipated computational challenges that were encountered during the TianNu runtime.« less
Workflows for Full Waveform Inversions
NASA Astrophysics Data System (ADS)
Boehm, Christian; Krischer, Lion; Afanasiev, Michael; van Driel, Martin; May, Dave A.; Rietmann, Max; Fichtner, Andreas
2017-04-01
Despite many theoretical advances and the increasing availability of high-performance computing clusters, full seismic waveform inversions still face considerable challenges regarding data and workflow management. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these often manpower-bounded issues that need to be overcome to facilitate further progress. Modern inversions involve huge amounts of data and require a tight integration between numerical PDE solvers, data acquisition and processing systems, nonlinear optimization libraries, and job orchestration frameworks. To this end we created a set of libraries and applications revolving around Salvus (http://salvus.io), a novel software package designed to solve large-scale full waveform inverse problems. This presentation focuses on solving passive source seismic full waveform inversions from local to global scales with Salvus. We discuss (i) design choices for the aforementioned components required for full waveform modeling and inversion, (ii) their implementation in the Salvus framework, and (iii) how it is all tied together by a usable workflow system. We combine state-of-the-art algorithms ranging from high-order finite-element solutions of the wave equation to quasi-Newton optimization algorithms using trust-region methods that can handle inexact derivatives. All is steered by an automated interactive graph-based workflow framework capable of orchestrating all necessary pieces. This naturally facilitates the creation of new Earth models and hopefully sparks new scientific insights. Additionally, and even more importantly, it enhances reproducibility and reliability of the final results.
Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor
NASA Astrophysics Data System (ADS)
Chen, B.; Kantowski, R.; Dai, X.; Baron, E.; Van der Mark, P.
2017-04-01
Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a microlensing code with that of NVIDIA's GPUs. For the selected set of parameters evaluated in our experiment, we find that the speedup by Intel's Knights Corner coprocessor is comparable to that by NVIDIA's Fermi family of GPUs with compute capability 2.0, but less significant than GPUs with higher compute capabilities such as the Kepler. However, the very recently released second generation Xeon Phi, Knights Landing, is about 5.8 times faster than the Knights Corner, and about 2.9 times faster than the Kepler GPU used in our simulations. We conclude that the Xeon Phi is a very promising alternative to GPUs for modern high performance microlensing simulations.
NASA Astrophysics Data System (ADS)
Hadade, Ioan; di Mare, Luca
2016-08-01
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Method and computer program product for maintenance and modernization backlogging
Mattimore, Bernard G; Reynolds, Paul E; Farrell, Jill M
2013-02-19
According to one embodiment, a computer program product for determining future facility conditions includes a computer readable medium having computer readable program code stored therein. The computer readable program code includes computer readable program code for calculating a time period specific maintenance cost, for calculating a time period specific modernization factor, and for calculating a time period specific backlog factor. Future facility conditions equal the time period specific maintenance cost plus the time period specific modernization factor plus the time period specific backlog factor. In another embodiment, a computer-implemented method for calculating future facility conditions includes calculating a time period specific maintenance cost, calculating a time period specific modernization factor, and calculating a time period specific backlog factor. Future facility conditions equal the time period specific maintenance cost plus the time period specific modernization factor plus the time period specific backlog factor. Other embodiments are also presented.
MPACT Standard Input User s Manual, Version 2.2.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, Benjamin S.; Downar, Thomas; Fitzgerald, Andrew
The MPACT (Michigan PArallel Charactistics based Transport) code is designed to perform high-fidelity light water reactor (LWR) analysis using whole-core pin-resolved neutron transport calculations on modern parallel-computing hardware. The code consists of several libraries which provide the functionality necessary to solve steady-state eigenvalue problems. Several transport capabilities are available within MPACT including both 2-D and 3-D Method of Characteristics (MOC). A three-dimensional whole core solution based on the 2D-1D solution method provides the capability for full core depletion calculations.
Virtual file system on NoSQL for processing high volumes of HL7 messages.
Kimura, Eizen; Ishihara, Ken
2015-01-01
The Standardized Structured Medical Information Exchange (SS-MIX) is intended to be the standard repository for HL7 messages that depend on a local file system. However, its scalability is limited. We implemented a virtual file system using NoSQL to incorporate modern computing technology into SS-MIX and allow the system to integrate local patient IDs from different healthcare systems into a universal system. We discuss its implementation using the database MongoDB and describe its performance in a case study.
Pienaar, Rudolph; Rannou, Nicolas; Bernal, Jorge; Hahn, Daniel; Grant, P Ellen
2015-01-01
The utility of web browsers for general purpose computing, long anticipated, is only now coming into fruition. In this paper we present a web-based medical image data and information management software platform called ChRIS ([Boston] Children's Research Integration System). ChRIS' deep functionality allows for easy retrieval of medical image data from resources typically found in hospitals, organizes and presents information in a modern feed-like interface, provides access to a growing library of plugins that process these data - typically on a connected High Performance Compute Cluster, allows for easy data sharing between users and instances of ChRIS and provides powerful 3D visualization and real time collaboration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Tony; Sutherland, James C.
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Saad, Tony; Sutherland, James C.
2016-05-04
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Solving lattice QCD systems of equations using mixed precision solvers on GPUs
NASA Astrophysics Data System (ADS)
Clark, M. A.; Babich, R.; Barros, K.; Brower, R. C.; Rebbi, C.
2010-09-01
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodynamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIA's CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40, 135 and 212 Gflops for double, single and half precision respectively on NVIDIA's GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.
NASA Astrophysics Data System (ADS)
Heilmann, B. Z.; Vallenilla Ferrara, A. M.
2009-04-01
The constant growth of contaminated sites, the unsustainable use of natural resources, and, last but not least, the hydrological risk related to extreme meteorological events and increased climate variability are major environmental issues of today. Finding solutions for these complex problems requires an integrated cross-disciplinary approach, providing a unified basis for environmental science and engineering. In computer science, grid computing is emerging worldwide as a formidable tool allowing distributed computation and data management with administratively-distant resources. Utilizing these modern High Performance Computing (HPC) technologies, the GRIDA3 project bundles several applications from different fields of geoscience aiming to support decision making for reasonable and responsible land use and resource management. In this abstract we present a geophysical application called EIAGRID that uses grid computing facilities to perform real-time subsurface imaging by on-the-fly processing of seismic field data and fast optimization of the processing workflow. Even though, seismic reflection profiling has a broad application range spanning from shallow targets in a few meters depth to targets in a depth of several kilometers, it is primarily used by the hydrocarbon industry and hardly for environmental purposes. The complexity of data acquisition and processing poses severe problems for environmental and geotechnical engineering: Professional seismic processing software is expensive to buy and demands large experience from the user. In-field processing equipment needed for real-time data Quality Control (QC) and immediate optimization of the acquisition parameters is often not available for this kind of studies. As a result, the data quality will be suboptimal. In the worst case, a crucial parameter such as receiver spacing, maximum offset, or recording time turns out later to be inappropriate and the complete acquisition campaign has to be repeated. The EIAGRID portal provides an innovative solution to this problem combining state-of-the-art data processing methods and modern remote grid computing technology. In field-processing equipment is substituted by remote access to high performance grid computing facilities. The latter can be ubiquitously controlled by a user-friendly web-browser interface accessed from the field by any mobile computer using wireless data transmission technology such as UMTS (Universal Mobile Telecommunications System) or HSUPA/HSDPA (High-Speed Uplink/Downlink Packet Access). The complexity of data-manipulation and processing and thus also the time demanding user interaction is minimized by a data-driven, and highly automated velocity analysis and imaging approach based on the Common-Reflection-Surface (CRS) stack. Furthermore, the huge computing power provided by the grid deployment allows parallel testing of alternative processing sequences and parameter settings, a feature which considerably reduces the turn-around times. A shared data storage using georeferencing tools and data grid technology is under current development. It will allow to publish already accomplished projects, making results, processing workflows and parameter settings available in a transparent and reproducible way. Creating a unified database shared by all users will facilitate complex studies and enable the use of data-crossing techniques to incorporate results of other environmental applications hosted on the GRIDA3 portal.
Research on Influence of Cloud Environment on Traditional Network Security
NASA Astrophysics Data System (ADS)
Ming, Xiaobo; Guo, Jinhua
2018-02-01
Cloud computing is a symbol of the progress of modern information network, cloud computing provides a lot of convenience to the Internet users, but it also brings a lot of risk to the Internet users. Second, one of the main reasons for Internet users to choose cloud computing is that the network security performance is great, it also is the cornerstone of cloud computing applications. This paper briefly explores the impact on cloud environment on traditional cybersecurity, and puts forward corresponding solutions.
High resolution ultrasonic spectroscopy system for nondestructive evaluation
NASA Technical Reports Server (NTRS)
Chen, C. H.
1991-01-01
With increased demand for high resolution ultrasonic evaluation, computer based systems or work stations become essential. The ultrasonic spectroscopy method of nondestructive evaluation (NDE) was used to develop a high resolution ultrasonic inspection system supported by modern signal processing, pattern recognition, and neural network technologies. The basic system which was completed consists of a 386/20 MHz PC (IBM AT compatible), a pulser/receiver, a digital oscilloscope with serial and parallel communications to the computer, an immersion tank with motor control of X-Y axis movement, and the supporting software package, IUNDE, for interactive ultrasonic evaluation. Although the hardware components are commercially available, the software development is entirely original. By integrating signal processing, pattern recognition, maximum entropy spectral analysis, and artificial neural network functions into the system, many NDE tasks can be performed. The high resolution graphics capability provides visualization of complex NDE problems. The phase 3 efforts involve intensive marketing of the software package and collaborative work with industrial sectors.
Qu, Hai-bin; Cheng, Yi-yu; Wang, Yue-sheng
2003-10-01
Based on the review of some engineering problems on developing modern production industry of Traditional Chinese Medicine (TCM), the differences of TCM production industry between China and abroad were pointed out. Accelerating the application and extension of high-tech and computer integrated manufacturing system (CIMS) were suggested to promote the technology advancement of TCM industry.
Real-time dose computation: GPU-accelerated source modeling and superposition/convolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacques, Robert; Wong, John; Taylor, Russell
Purpose: To accelerate dose calculation to interactive rates using highly parallel graphics processing units (GPUs). Methods: The authors have extended their prior work in GPU-accelerated superposition/convolution with a modern dual-source model and have enhanced performance. The primary source algorithm supports both focused leaf ends and asymmetric rounded leaf ends. The extra-focal algorithm uses a discretized, isotropic area source and models multileaf collimator leaf height effects. The spectral and attenuation effects of static beam modifiers were integrated into each source's spectral function. The authors introduce the concepts of arc superposition and delta superposition. Arc superposition utilizes separate angular sampling for themore » total energy released per unit mass (TERMA) and superposition computations to increase accuracy and performance. Delta superposition allows single beamlet changes to be computed efficiently. The authors extended their concept of multi-resolution superposition to include kernel tilting. Multi-resolution superposition approximates solid angle ray-tracing, improving performance and scalability with a minor loss in accuracy. Superposition/convolution was implemented using the inverse cumulative-cumulative kernel and exact radiological path ray-tracing. The accuracy analyses were performed using multiple kernel ray samplings, both with and without kernel tilting and multi-resolution superposition. Results: Source model performance was <9 ms (data dependent) for a high resolution (400{sup 2}) field using an NVIDIA (Santa Clara, CA) GeForce GTX 280. Computation of the physically correct multispectral TERMA attenuation was improved by a material centric approach, which increased performance by over 80%. Superposition performance was improved by {approx}24% to 0.058 and 0.94 s for 64{sup 3} and 128{sup 3} water phantoms; a speed-up of 101-144x over the highly optimized Pinnacle{sup 3} (Philips, Madison, WI) implementation. Pinnacle{sup 3} times were 8.3 and 94 s, respectively, on an AMD (Sunnyvale, CA) Opteron 254 (two cores, 2.8 GHz). Conclusions: The authors have completed a comprehensive, GPU-accelerated dose engine in order to provide a substantial performance gain over CPU based implementations. Real-time dose computation is feasible with the accuracy levels of the superposition/convolution algorithm.« less
ERIC Educational Resources Information Center
Nagasinghe, Iranga
2010-01-01
This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and…
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Dynamic provisioning of local and remote compute resources with OpenStack
NASA Astrophysics Data System (ADS)
Giffels, M.; Hauth, T.; Polgart, F.; Quast, G.
2015-12-01
Modern high-energy physics experiments rely on the extensive usage of computing resources, both for the reconstruction of measured events as well as for Monte-Carlo simulation. The Institut fur Experimentelle Kernphysik (EKP) at KIT is participating in both the CMS and Belle experiments with computing and storage resources. In the upcoming years, these requirements are expected to increase due to growing amount of recorded data and the rise in complexity of the simulated events. It is therefore essential to increase the available computing capabilities by tapping into all resource pools. At the EKP institute, powerful desktop machines are available to users. Due to the multi-core nature of modern CPUs, vast amounts of CPU time are not utilized by common desktop usage patterns. Other important providers of compute capabilities are classical HPC data centers at universities or national research centers. Due to the shared nature of these installations, the standardized software stack required by HEP applications cannot be installed. A viable way to overcome this constraint and offer a standardized software environment in a transparent manner is the usage of virtualization technologies. The OpenStack project has become a widely adopted solution to virtualize hardware and offer additional services like storage and virtual machine management. This contribution will report on the incorporation of the institute's desktop machines into a private OpenStack Cloud. The additional compute resources provisioned via the virtual machines have been used for Monte-Carlo simulation and data analysis. Furthermore, a concept to integrate shared, remote HPC centers into regular HEP job workflows will be presented. In this approach, local and remote resources are merged to form a uniform, virtual compute cluster with a single point-of-entry for the user. Evaluations of the performance and stability of this setup and operational experiences will be discussed.
Wroe, Stephen; Parr, William C H; Ledogar, Justin A; Bourke, Jason; Evans, Samuel P; Fiorenza, Luca; Benazzi, Stefano; Hublin, Jean-Jacques; Stringer, Chris; Kullmer, Ottmar; Curry, Michael; Rae, Todd C; Yokley, Todd R
2018-04-11
Three adaptive hypotheses have been forwarded to explain the distinctive Neanderthal face: (i) an improved ability to accommodate high anterior bite forces, (ii) more effective conditioning of cold and/or dry air and, (iii) adaptation to facilitate greater ventilatory demands. We test these hypotheses using three-dimensional models of Neanderthals, modern humans, and a close outgroup ( Homo heidelbergensis ), applying finite-element analysis (FEA) and computational fluid dynamics (CFD). This is the most comprehensive application of either approach applied to date and the first to include both. FEA reveals few differences between H. heidelbergensis , modern humans, and Neanderthals in their capacities to sustain high anterior tooth loadings. CFD shows that the nasal cavities of Neanderthals and especially modern humans condition air more efficiently than does that of H. heidelbergensis , suggesting that both evolved to better withstand cold and/or dry climates than less derived Homo We further find that Neanderthals could move considerably more air through the nasal pathway than could H. heidelbergensis or modern humans, consistent with the propositions that, relative to our outgroup Homo , Neanderthal facial morphology evolved to reflect improved capacities to better condition cold, dry air, and, to move greater air volumes in response to higher energetic requirements. © 2018 The Author(s).
An Adaptable Seismic Data Format for Modern Scientific Workflows
NASA Astrophysics Data System (ADS)
Smith, J. A.; Bozdag, E.; Krischer, L.; Lefebvre, M.; Lei, W.; Podhorszki, N.; Tromp, J.
2013-12-01
Data storage, exchange, and access play a critical role in modern seismology. Current seismic data formats, such as SEED, SAC, and SEG-Y, were designed with specific applications in mind and are frequently a major bottleneck in implementing efficient workflows. We propose a new modern parallel format that can be adapted for a variety of seismic workflows. The Adaptable Seismic Data Format (ASDF) features high-performance parallel read and write support and the ability to store an arbitrary number of traces of varying sizes. Provenance information is stored inside the file so that users know the origin of the data as well as the precise operations that have been applied to the waveforms. The design of the new format is based on several real-world use cases, including earthquake seismology and seismic interferometry. The metadata is based on the proven XML schemas StationXML and QuakeML. Existing time-series analysis tool-kits are easily interfaced with this new format so that seismologists can use robust, previously developed software packages, such as ObsPy and the SAC library. ADIOS, netCDF4, and HDF5 can be used as the underlying container format. At Princeton University, we have chosen to use ADIOS as the container format because it has shown superior scalability for certain applications, such as dealing with big data on HPC systems. In the context of high-performance computing, we have implemented ASDF into the global adjoint tomography workflow on Oak Ridge National Laboratory's supercomputer Titan.
NASA Astrophysics Data System (ADS)
Wang, Zhen-yu; Yu, Jian-cheng; Zhang, Ai-qun; Wang, Ya-xing; Zhao, Wen-tao
2017-12-01
Combining high precision numerical analysis methods with optimization algorithms to make a systematic exploration of a design space has become an important topic in the modern design methods. During the design process of an underwater glider's flying-wing structure, a surrogate model is introduced to decrease the computation time for a high precision analysis. By these means, the contradiction between precision and efficiency is solved effectively. Based on the parametric geometry modeling, mesh generation and computational fluid dynamics analysis, a surrogate model is constructed by adopting the design of experiment (DOE) theory to solve the multi-objects design optimization problem of the underwater glider. The procedure of a surrogate model construction is presented, and the Gaussian kernel function is specifically discussed. The Particle Swarm Optimization (PSO) algorithm is applied to hydrodynamic design optimization. The hydrodynamic performance of the optimized flying-wing structure underwater glider increases by 9.1%.
Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data, which needs to be matched against exponentially growing databases of known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems alsomore » include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variability, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. In this paper, we discuss the implementation of the Aho-Corasick algorithm for GPU-accelerated high performance systems. We present an optimized implementation of Aho-Corasick for GPUs and discuss its tradeoffs on the Tesla T10 and he new Tesla T20 (codename Fermi) GPUs. We then integrate the optimized GPU code, respectively, in a MPI-based and in a pthreads-based load balancer to enable execution of the algorithm on clusters and large sharedmemory multiprocessors (SMPs) accelerated with multiple GPUs.« less
Energy Systems Integration Facility (ESIF): Golden, CO - Energy Integration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheppy, Michael; VanGeet, Otto; Pless, Shanti
2015-03-01
At NREL's Energy Systems Integration Facility (ESIF) in Golden, Colo., scientists and engineers work to overcome challenges related to how the nation generates, delivers and uses energy by modernizing the interplay between energy sources, infrastructure, and data. Test facilities include a megawatt-scale ac electric grid, photovoltaic simulators and a load bank. Additionally, a high performance computing data center (HPCDC) is dedicated to advancing renewable energy and energy efficient technologies. A key design strategy is to use waste heat from the HPCDC to heat parts of the building. The ESIF boasts an annual EUI of 168.3 kBtu/ft2. This article describes themore » building's procurement, design and first year of performance.« less
Vectorization with SIMD extensions speeds up reconstruction in electron tomography.
Agulleiro, J I; Garzón, E M; García, I; Fernández, J J
2010-06-01
Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processor's SIMD extensions in the field of 3D electron microscopy.
Parallelization of the Coupled Earthquake Model
NASA Technical Reports Server (NTRS)
Block, Gary; Li, P. Peggy; Song, Yuhe T.
2007-01-01
This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.
Tempest: GPU-CPU computing for high-throughput database spectral matching.
Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A
2012-07-06
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
Performance of the fusion code GYRO on four generations of Cray computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fahey, Mark R
2014-01-01
GYRO is a code used for the direct numerical simulation of plasma microturbulence. It has been ported to a variety of modern MPP platforms including several modern commodity clusters, IBM SPs, and Cray XC, XT, and XE series machines. We briefly describe the mathematical structure of the equations, the data layout, and the redistribution scheme. Also, while the performance and scaling of GYRO on many of these systems has been shown before, here we show the comparative performance and scaling on four generations of Cray supercomputers including the newest addition - the Cray XC30. The more recently added hybrid OpenMP/MPImore » imple- mentation also shows a great deal of promise on custom HPC systems that utilize fast CPUs and proprietary interconnects. Four machines of varying sizes were used in the experiment, all of which are located at the National Institute for Computational Sciences at the University of Tennessee at Knoxville and Oak Ridge National Laboratory. The advantages, limitations, and performance of using each system are discussed.« less
ERIC Educational Resources Information Center
Paraskevas, Michael; Zarouchas, Thomas; Angelopoulos, Panagiotis; Perikos, Isidoros
2013-01-01
Now days the growing need for highly qualified computer science educators in modern educational environments is commonplace. This study examines the potential use of Greek School Network (GSN) to provide a robust and comprehensive e-training course for computer science educators in order to efficiently exploit advanced IT services and establish a…
Architectural Techniques For Managing Non-volatile Caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less
Numerical characteristics of quantum computer simulation
NASA Astrophysics Data System (ADS)
Chernyavskiy, A.; Khamitov, K.; Teplov, A.; Voevodin, V.; Voevodin, Vl.
2016-12-01
The simulation of quantum circuits is significantly important for the implementation of quantum information technologies. The main difficulty of such modeling is the exponential growth of dimensionality, thus the usage of modern high-performance parallel computations is relevant. As it is well known, arbitrary quantum computation in circuit model can be done by only single- and two-qubit gates, and we analyze the computational structure and properties of the simulation of such gates. We investigate the fact that the unique properties of quantum nature lead to the computational properties of the considered algorithms: the quantum parallelism make the simulation of quantum gates highly parallel, and on the other hand, quantum entanglement leads to the problem of computational locality during simulation. We use the methodology of the AlgoWiki project (algowiki-project.org) to analyze the algorithm. This methodology consists of theoretical (sequential and parallel complexity, macro structure, and visual informational graph) and experimental (locality and memory access, scalability and more specific dynamic characteristics) parts. Experimental part was made by using the petascale Lomonosov supercomputer (Moscow State University, Russia). We show that the simulation of quantum gates is a good base for the research and testing of the development methods for data intense parallel software, and considered methodology of the analysis can be successfully used for the improvement of the algorithms in quantum information science.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Human performance in the modern cockpit
NASA Technical Reports Server (NTRS)
Dismukes, R. K.; Cohen, M. M.
1992-01-01
This panel was organized by the Aerospace Human Factors Committee to illustrate behavioral research on the perceptual, cognitive, and group processes that determine crew effectiveness in modern cockpits. Crew reactions to the introduction of highly automated systems in the cockpit will be reported on. Automation can improve operational capabilities and efficiency and can reduce some types of human error, but may also introduce entirely new opportunities for error. The problem solving and decision making strategies used by crews led by captains with various personality profiles will be discussed. Also presented will be computational approaches to modeling the cognitive demands of cockpit operations and the cognitive capabilities and limitations of crew members. Factors contributing to aircrew deviations from standard operating procedures and misuse of checklist, often leading to violations, incidents, or accidents will be examined. The mechanisms of visual perception pilots use in aircraft control and the implications of these mechanisms for effective design of visual displays will be discussed.
Application of laminar flow control to high-bypass-ratio turbofan engine nacelles
NASA Technical Reports Server (NTRS)
Wie, Y. S.; Collier, F. S., Jr.; Wagner, R. D.
1991-01-01
Recently, the concept of the application of hybrid laminar flow to modern commercial transport aircraft was successfully flight tested on a Boeing 757 aircraft. In this limited demonstration, in which only part of the upper surface of the swept wing was designed for the attainment of laminar flow, significant local drag reduction was measured. This paper addresses the potential application of this technology to laminarize the external surface of large, modern turbofan engine nacelles which may comprise as much as 5-10 percent of the total wetted area of future commercial transports. A hybrid-laminar-flow-control (HLFC) pressure distribution is specified and the corresponding nacelle geometry is computed utilizing a predictor/corrector design method. Linear stability calculations are conducted to provide predictions of the extent of the laminar boundary layer. Performance studies are presented to determine potential benefits in terms of reduced fuel consumption.
Cyberdyn supercomputer - a tool for imaging geodinamic processes
NASA Astrophysics Data System (ADS)
Pomeran, Mihai; Manea, Vlad; Besutiu, Lucian; Zlagnean, Luminita
2014-05-01
More and more physical processes developed within the deep interior of our planet, but with significant impact on the Earth's shape and structure, become subject to numerical modelling by using high performance computing facilities. Nowadays, worldwide an increasing number of research centers decide to make use of such powerful and fast computers for simulating complex phenomena involving fluid dynamics and get deeper insight to intricate problems of Earth's evolution. With the CYBERDYN cybernetic infrastructure (CCI), the Solid Earth Dynamics Department in the Institute of Geodynamics of the Romanian Academy boldly steps into the 21st century by entering the research area of computational geodynamics. The project that made possible this advancement, has been jointly supported by EU and Romanian Government through the Structural and Cohesion Funds. It lasted for about three years, ending October 2013. CCI is basically a modern high performance Beowulf-type supercomputer (HPCC), combined with a high performance visualization cluster (HPVC) and a GeoWall. The infrastructure is mainly structured around 1344 cores and 3 TB of RAM. The high speed interconnect is provided by a Qlogic InfiniBand switch, able to transfer up to 40 Gbps. The CCI storage component is a 40 TB Panasas NAS. The operating system is Linux (CentOS). For control and maintenance, the Bright Cluster Manager package is used. The SGE job scheduler manages the job queues. CCI has been designed for a theoretical peak performance up to 11.2 TFlops. Speed tests showed that a high resolution numerical model (256 × 256 × 128 FEM elements) could be resolved with a mean computational speed of 1 time step at 30 seconds, by employing only a fraction of the computing power (20%). After passing the mandatory tests, the CCI has been involved in numerical modelling of various scenarios related to the East Carpathians tectonic and geodynamic evolution, including the Neogene magmatic activity, and the intriguing intermediate-depth seismicity within the so-called Vrancea zone. The CFD code for numerical modelling is CitcomS, a widely employed open source package specifically developed for earth sciences. Several preliminary 3D geodynamic models for simulating an assumed subduction or the effect of a mantle plume will be presented and discussed.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
Validation of a RANS transition model using a high-order weighted compact nonlinear scheme
NASA Astrophysics Data System (ADS)
Tu, GuoHua; Deng, XiaoGang; Mao, MeiLiang
2013-04-01
A modified transition model is given based on the shear stress transport (SST) turbulence model and an intermittency transport equation. The energy gradient term in the original model is replaced by flow strain rate to saving computational costs. The model employs local variables only, and then it can be conveniently implemented in modern computational fluid dynamics codes. The fifth-order weighted compact nonlinear scheme and the fourth-order staggered scheme are applied to discrete the governing equations for the purpose of minimizing discretization errors, so as to mitigate the confusion between numerical errors and transition model errors. The high-order package is compared with a second-order TVD method on simulating the transitional flow of a flat plate. Numerical results indicate that the high-order package give better grid convergence property than that of the second-order method. Validation of the transition model is performed for transitional flows ranging from low speed to hypersonic speed.
Pei, Zongrui; Max-Planck-Inst. fur Eisenforschung, Duseldorf; Eisenbach, Markus
2017-02-06
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), themore » local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.« less
Identification of key ancestors of modern germplasm in a breeding program of maize.
Technow, F; Schrag, T A; Schipprack, W; Melchinger, A E
2014-12-01
Probabilities of gene origin computed from the genomic kinships matrix can accurately identify key ancestors of modern germplasms Identifying the key ancestors of modern plant breeding populations can provide valuable insights into the history of a breeding program and provide reference genomes for next generation whole genome sequencing. In an animal breeding context, a method was developed that employs probabilities of gene origin, computed from the pedigree-based additive kinship matrix, for identifying key ancestors. Because reliable and complete pedigree information is often not available in plant breeding, we replaced the additive kinship matrix with the genomic kinship matrix. As a proof-of-concept, we applied this approach to simulated data sets with known ancestries. The relative contribution of the ancestral lines to later generations could be determined with high accuracy, with and without selection. Our method was subsequently used for identifying the key ancestors of the modern Dent germplasm of the public maize breeding program of the University of Hohenheim. We found that the modern germplasm can be traced back to six or seven key ancestors, with one or two of them having a disproportionately large contribution. These results largely corroborated conjectures based on early records of the breeding program. We conclude that probabilities of gene origin computed from the genomic kinships matrix can be used for identifying key ancestors in breeding programs and estimating the proportion of genes contributed by them.
Overview of computational structural methods for modern military aircraft
NASA Technical Reports Server (NTRS)
Kudva, J. N.
1992-01-01
Computational structural methods are essential for designing modern military aircraft. This briefing deals with computational structural methods (CSM) currently used. First a brief summary of modern day aircraft structural design procedures is presented. Following this, several ongoing CSM related projects at Northrop are discussed. Finally, shortcomings in this area, future requirements, and summary remarks are given.
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.
Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai
2015-09-18
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
High Resolution Nature Runs and the Big Data Challenge
NASA Technical Reports Server (NTRS)
Webster, W. Phillip; Duffy, Daniel Q.
2015-01-01
NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
NASA Technical Reports Server (NTRS)
Schmidt, Phillip; Garg, Sanjay; Holowecky, Brian
1992-01-01
A parameter optimization framework is presented to solve the problem of partitioning a centralized controller into a decentralized hierarchical structure suitable for integrated flight/propulsion control implementation. The controller partitioning problem is briefly discussed and a cost function to be minimized is formulated, such that the resulting 'optimal' partitioned subsystem controllers will closely match the performance (including robustness) properties of the closed-loop system with the centralized controller while maintaining the desired controller partitioning structure. The cost function is written in terms of parameters in a state-space representation of the partitioned sub-controllers. Analytical expressions are obtained for the gradient of this cost function with respect to parameters, and an optimization algorithm is developed using modern computer-aided control design and analysis software. The capabilities of the algorithm are demonstrated by application to partitioned integrated flight/propulsion control design for a modern fighter aircraft in the short approach to landing task. The partitioning optimization is shown to lead to reduced-order subcontrollers that match the closed-loop command tracking and decoupling performance achieved by a high-order centralized controller.
NASA Technical Reports Server (NTRS)
Schmidt, Phillip H.; Garg, Sanjay; Holowecky, Brian R.
1993-01-01
A parameter optimization framework is presented to solve the problem of partitioning a centralized controller into a decentralized hierarchical structure suitable for integrated flight/propulsion control implementation. The controller partitioning problem is briefly discussed and a cost function to be minimized is formulated, such that the resulting 'optimal' partitioned subsystem controllers will closely match the performance (including robustness) properties of the closed-loop system with the centralized controller while maintaining the desired controller partitioning structure. The cost function is written in terms of parameters in a state-space representation of the partitioned sub-controllers. Analytical expressions are obtained for the gradient of this cost function with respect to parameters, and an optimization algorithm is developed using modern computer-aided control design and analysis software. The capabilities of the algorithm are demonstrated by application to partitioned integrated flight/propulsion control design for a modern fighter aircraft in the short approach to landing task. The partitioning optimization is shown to lead to reduced-order subcontrollers that match the closed-loop command tracking and decoupling performance achieved by a high-order centralized controller.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodriguez, Salvador B.
Smart grids are a crucial component for enabling the nation’s future energy needs, as part of a modernization effort led by the Department of Energy. Smart grids and smart microgrids are being considered in niche applications, and as part of a comprehensive energy strategy to help manage the nation’s growing energy demands, for critical infrastructures, military installations, small rural communities, and large populations with limited water supplies. As part of a far-reaching strategic initiative, Sandia National Laboratories (SNL) presents herein a unique, three-pronged approach to integrate small modular reactors (SMRs) into microgrids, with the goal of providing economically-competitive, reliable, andmore » secure energy to meet the nation’s needs. SNL’s triad methodology involves an innovative blend of smart microgrid technology, high performance computing (HPC), and advanced manufacturing (AM). In this report, Sandia’s current capabilities in those areas are summarized, as well as paths forward that will enable DOE to achieve its energy goals. In the area of smart grid/microgrid technology, Sandia’s current computational capabilities can model the entire grid, including temporal aspects and cyber security issues. Our tools include system development, integration, testing and evaluation, monitoring, and sustainment.« less
Aerodynamic Characterization of a Modern Launch Vehicle
NASA Technical Reports Server (NTRS)
Hall, Robert M.; Holland, Scott D.; Blevins, John A.
2011-01-01
A modern launch vehicle is by necessity an extremely integrated design. The accurate characterization of its aerodynamic characteristics is essential to determine design loads, to design flight control laws, and to establish performance. The NASA Ares Aerodynamics Panel has been responsible for technical planning, execution, and vetting of the aerodynamic characterization of the Ares I vehicle. An aerodynamics team supporting the Panel consists of wind tunnel engineers, computational engineers, database engineers, and other analysts that address topics such as uncertainty quantification. The team resides at three NASA centers: Langley Research Center, Marshall Space Flight Center, and Ames Research Center. The Panel has developed strategies to synergistically combine both the wind tunnel efforts and the computational efforts with the goal of validating the computations. Selected examples highlight key flow physics and, where possible, the fidelity of the comparisons between wind tunnel results and the computations. Lessons learned summarize what has been gleaned during the project and can be useful for other vehicle development projects.
SiMon: Simulation Monitor for Computational Astrophysics
NASA Astrophysics Data System (ADS)
Xuran Qian, Penny; Cai, Maxwell Xu; Portegies Zwart, Simon; Zhu, Ming
2017-09-01
Scientific discovery via numerical simulations is important in modern astrophysics. This relatively new branch of astrophysics has become possible due to the development of reliable numerical algorithms and the high performance of modern computing technologies. These enable the analysis of large collections of observational data and the acquisition of new data via simulations at unprecedented accuracy and resolution. Ideally, simulations run until they reach some pre-determined termination condition, but often other factors cause extensive numerical approaches to break down at an earlier stage. In those cases, processes tend to be interrupted due to unexpected events in the software or the hardware. In those cases, the scientist handles the interrupt manually, which is time-consuming and prone to errors. We present the Simulation Monitor (SiMon) to automatize the farming of large and extensive simulation processes. Our method is light-weight, it fully automates the entire workflow management, operates concurrently across multiple platforms and can be installed in user space. Inspired by the process of crop farming, we perceive each simulation as a crop in the field and running simulation becomes analogous to growing crops. With the development of SiMon we relax the technical aspects of simulation management. The initial package was developed for extensive parameter searchers in numerical simulations, but it turns out to work equally well for automating the computational processing and reduction of observational data reduction.
[Economic efficiency of computer monitoring of health].
Il'icheva, N P; Stazhadze, L L
2001-01-01
Presents the method of computer monitoring of health, based on utilization of modern information technologies in public health. The method helps organize preventive activities of an outpatient clinic at a high level and essentially decrease the time and money loss. Efficiency of such preventive measures, increased number of computer and Internet users suggests that such methods are promising and further studies in this field are needed.
Numerical and experimental analyses of lighting columns in terms of passive safety
NASA Astrophysics Data System (ADS)
Jedliński, Tomasz Ireneusz; Buśkiewicz, Jacek
2018-01-01
Modern lighting columns have a very beneficial influence on road safety. Currently, the columns are being designed to keep the driver safe in the event of a car collision. The following work compares experimental results of vehicle impact on a lighting column with FEM simulations performed using the Ansys LS-DYNA program. Due to high costs of experiments and time-consuming research process, the computer software seems to be very useful utility in the development of pole structures, which are to absorb kinetic energy of the vehicle in a precisely prescribed way.
NASA Technical Reports Server (NTRS)
Kim, Jonnathan H.
1995-01-01
Humans can perform many complicated tasks without explicit rules. This inherent and advantageous capability becomes a hurdle when a task is to be automated. Modern computers and numerical calculations require explicit rules and discrete numerical values. In order to bridge the gap between human knowledge and automating tools, a knowledge model is proposed. Knowledge modeling techniques are discussed and utilized to automate a labor and time intensive task of detecting anomalous bearing wear patterns in the Space Shuttle Main Engine (SSME) High Pressure Oxygen Turbopump (HPOTP).
Rare earth element and rare metal inventory of central Asia
Mihalasky, Mark J.; Tucker, Robert D.; Renaud, Karine; Verstraeten, Ingrid M.
2018-03-06
Rare earth elements (REE), with their unique physical and chemical properties, are an essential part of modern living. REE have enabled development and manufacture of high-performance materials, processes, and electronic technologies commonly used today in computing and communications, clean energy and transportation, medical treatment and health care, glass and ceramics, aerospace and defense, and metallurgy and chemical refining. Central Asia is an emerging REE and rare metals (RM) producing region. A newly compiled inventory of REE-RM-bearing mineral occurrences and delineation of areas-of-interest indicate this region may have considerable undiscovered resources.
Bayesian Software Health Management for Aircraft Guidance, Navigation, and Control
NASA Technical Reports Server (NTRS)
Schumann, Johann; Mbaya, Timmy; Menghoel, Ole
2011-01-01
Modern aircraft, both piloted fly-by-wire commercial aircraft as well as UAVs, more and more depend on highly complex safety critical software systems with many sensors and computer-controlled actuators. Despite careful design and V&V of the software, severe incidents have happened due to malfunctioning software. In this paper, we discuss the use of Bayesian networks (BNs) to monitor the health of the on-board software and sensor system, and to perform advanced on-board diagnostic reasoning. We will focus on the approach to develop reliable and robust health models for the combined software and sensor systems.
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Battlefield awareness computers: the engine of battlefield digitization
NASA Astrophysics Data System (ADS)
Ho, Jackson; Chamseddine, Ahmad
1997-06-01
To modernize the army for the 21st century, the U.S. Army Digitization Office (ADO) initiated in 1995 the Force XXI Battle Command Brigade-and-Below (FBCB2) Applique program which became a centerpiece in the U.S. Army's master plan to win future information wars. The Applique team led by TRW fielded a 'tactical Internet' for Brigade and below command to demonstrate the advantages of 'shared situation awareness' and battlefield digitization in advanced war-fighting experiments (AWE) to be conducted in March 1997 at the Army's National Training Center in California. Computing Devices is designated the primary hardware developer for the militarized version of the battlefield awareness computers. The first generation of militarized battlefield awareness computer, designated as the V3 computer, was an integration of off-the-shelf components developed to meet the agressive delivery requirements of the Task Force XXI AWE. The design efficiency and cost effectiveness of the computer hardware were secondary in importance to delivery deadlines imposed by the March 1997 AWE. However, declining defense budgets will impose cost constraints on the Force XXI production hardware that can only be met by rigorous value engineering to further improve design optimization for battlefield awareness without compromising the level of reliability the military has come to expect in modern military hardened vetronics. To answer the Army's needs for a more cost effective computing solution, Computing Devices developed a second generation 'combat ready' battlefield awareness computer, designated the V3+, which is designed specifically to meet the upcoming demands of Force XXI (FBCB2) and beyond. The primary design objective is to achieve a technologically superior design, value engineered to strike an optimal balance between reliability, life cycle cost, and procurement cost. Recognizing that the diverse digitization demands of Force XXI cannot be adequately met by any one computer hardware solution, Computing Devices is planning to develop a notebook sized military computer designed for space limited vehicle-mounted applications, as well as a high-performance portable workstation equipped with a 19', full color, ultra-high resolution and high brightness active matrix liquid crystal display (AMLCD) targeting the command posts and tactical operations centers (TOC) applications. Together with the wearable computers Computing Devices developed at the Minneapolis facility for dismounted soldiers, Computing Devices will have a complete suite of interoperable battlefield awareness computers spanning the entire spectrum of battle digitization operating environments. Although this paper's primary focus is on a second generation 'combat ready' battlefield awareness computer or the V3+, this paper also briefly discusses the extension of the V3+ architecture to address the needs of the embedded and command post applications.3080
Parallelization of the preconditioned IDR solver for modern multicore computer systems
NASA Astrophysics Data System (ADS)
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Intelligent redundant actuation system requirements and preliminary system design
NASA Technical Reports Server (NTRS)
Defeo, P.; Geiger, L. J.; Harris, J.
1985-01-01
Several redundant actuation system configurations were designed and demonstrated to satisfy the stringent operational requirements of advanced flight control systems. However, this has been accomplished largely through brute force hardware redundancy, resulting in significantly increased computational requirements on the flight control computers which perform the failure analysis and reconfiguration management. Modern technology now provides powerful, low-cost microprocessors which are effective in performing failure isolation and configuration management at the local actuator level. One such concept, called an Intelligent Redundant Actuation System (IRAS), significantly reduces the flight control computer requirements and performs the local tasks more comprehensively than previously feasible. The requirements and preliminary design of an experimental laboratory system capable of demonstrating the concept and sufficiently flexible to explore a variety of configurations are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aldridge, David Franklin; Collier, Sandra L.; Marlin, David H.
2005-05-01
This document is intended to serve as a users guide for the time-domain atmospheric acoustic propagation suite (TDAAPS) program developed as part of the Department of Defense High-Performance Modernization Office (HPCMP) Common High-Performance Computing Scalable Software Initiative (CHSSI). TDAAPS performs staggered-grid finite-difference modeling of the acoustic velocity-pressure system with the incorporation of spatially inhomogeneous winds. Wherever practical the control structure of the codes are written in C++ using an object oriented design. Sections of code where a large number of calculations are required are written in C or F77 in order to enable better compiler optimization of these sections. Themore » TDAAPS program conforms to a UNIX style calling interface. Most of the actions of the codes are controlled by adding flags to the invoking command line. This document presents a large number of examples and provides new users with the necessary background to perform acoustic modeling with TDAAPS.« less
Accelerating epistasis analysis in human genetics with consumer graphics hardware.
Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H
2009-07-24
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
Development of STOLAND, a versatile navigation, guidance and control system
NASA Technical Reports Server (NTRS)
Young, L. S.; Hansen, Q. M.; Rouse, W. E.; Osder, S. S.
1972-01-01
STOLAND has been developed to perform navigation, guidance, control, and flight management experiments in advanced V/STOL aircraft. The experiments have broad requirements and have dictated that STOLAND be capable of providing performance that would be realistic and equivalent to a wide range of current and future avionics systems. An integrated digital concept using modern avionics components was selected as the simplest approach to maximizing versatility and growth potential. Unique flexibility has been obtained by use of a single, general-purpose digital computer for all navigation, guidance, control, and displays computation.
cudaMap: a GPU accelerated program for gene expression connectivity mapping
2013-01-01
Background Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. Results cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Conclusion Emerging ‘omics’ technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap. PMID:24112435
Breast tumor malignancy modelling using evolutionary neural logic networks.
Tsakonas, Athanasios; Dounias, Georgios; Panagi, Georgia; Panourgias, Evangelia
2006-01-01
The present work proposes a computer assisted methodology for the effective modelling of the diagnostic decision for breast tumor malignancy. The suggested approach is based on innovative hybrid computational intelligence algorithms properly applied in related cytological data contained in past medical records. The experimental data used in this study were gathered in the early 1990s in the University of Wisconsin, based in post diagnostic cytological observations performed by expert medical staff. Data were properly encoded in a computer database and accordingly, various alternative modelling techniques were applied on them, in an attempt to form diagnostic models. Previous methods included standard optimisation techniques, as well as artificial intelligence approaches, in a way that a variety of related publications exists in modern literature on the subject. In this report, a hybrid computational intelligence approach is suggested, which effectively combines modern mathematical logic principles, neural computation and genetic programming in an effective manner. The approach proves promising either in terms of diagnostic accuracy and generalization capabilities, or in terms of comprehensibility and practical importance for the related medical staff.
High Throughput Genotoxicity Profiling of the US EPA ToxCast Chemical Library
A key aim of the ToxCast project is to investigate modern molecular and genetic high content and high throughput screening (HTS) assays, along with various computational tools to supplement and perhaps replace traditional assays for evaluating chemical toxicity. Genotoxicity is a...
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C.
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction. PMID:24904400
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research.
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction.
Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation.
Ziegenhein, Peter; Pirner, Sven; Ph Kamerling, Cornelis; Oelfke, Uwe
2015-08-07
Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37[Formula: see text] compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25[Formula: see text] and 1.95[Formula: see text] faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094
Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.
Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W
2016-01-01
Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
Mantle convection on modern supercomputers
NASA Astrophysics Data System (ADS)
Weismüller, Jens; Gmeiner, Björn; Mohr, Marcus; Waluga, Christian; Wohlmuth, Barbara; Rüde, Ulrich; Bunge, Hans-Peter
2015-04-01
Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures demand an interdisciplinary co-design. Here we report about recent advances of the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups in computer sciences, mathematics and geophysical application under the leadership of FAU Erlangen. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection assessing the impact of small scale processes on global mantle flow.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
NASA Astrophysics Data System (ADS)
Anisenkov, A. V.
2018-03-01
In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Using Computers in Fluids Engineering Education
NASA Technical Reports Server (NTRS)
Benson, Thomas J.
1998-01-01
Three approaches for using computers to improve basic fluids engineering education are presented. The use of computational fluid dynamics solutions to fundamental flow problems is discussed. The use of interactive, highly graphical software which operates on either a modern workstation or personal computer is highlighted. And finally, the development of 'textbooks' and teaching aids which are used and distributed on the World Wide Web is described. Arguments for and against this technology as applied to undergraduate education are also discussed.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Achieving Helicopter Modernization with Advanced Technology Turbine Engines
1999-04-01
computer modeling of compressor and turbine aerody- digital engine control ( FADEC ) with manual backup. namics. Modern directionally solidified and single...controlled by a dual RAH.66A M channel FADEC , and features a very simple installation "" Improved Gross Weight and significantly reduced pilot...air separation efficiencies as an "advanced technology" engine. Technological meas- high as 97.5%. The FADEC improves acceleration, ures include but
Towards an Automated Full-Turbofan Engine Numerical Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Turner, Mark G.; Norris, Andrew; Veres, Joseph P.
2003-01-01
The objective of this study was to demonstrate the high-fidelity numerical simulation of a modern high-bypass turbofan engine. The simulation utilizes the Numerical Propulsion System Simulation (NPSS) thermodynamic cycle modeling system coupled to a high-fidelity full-engine model represented by a set of coupled three-dimensional computational fluid dynamic (CFD) component models. Boundary conditions from the balanced, steady-state cycle model are used to define component boundary conditions in the full-engine model. Operating characteristics of the three-dimensional component models are integrated into the cycle model via partial performance maps generated automatically from the CFD flow solutions using one-dimensional meanline turbomachinery programs. This paper reports on the progress made towards the full-engine simulation of the GE90-94B engine, highlighting the generation of the high-pressure compressor partial performance map. The ongoing work will provide a system to evaluate the steady and unsteady aerodynamic and mechanical interactions between engine components at design and off-design operating conditions.
Recent inner ear specialization for high-speed hunting in cheetahs.
Grohé, Camille; Lee, Beatrice; Flynn, John J
2018-02-02
The cheetah, Acinonyx jubatus, is the fastest living land mammal. Because of its specialized hunting strategy, this species evolved a series of specialized morphological and functional body features to increase its exceptional predatory performance during high-speed hunting. Using high-resolution X-ray computed micro-tomography (μCT), we provide the first analyses of the size and shape of the vestibular system of the inner ear in cats, an organ essential for maintaining body balance and adapting head posture and gaze direction during movement in most vertebrates. We demonstrate that the vestibular system of modern cheetahs is extremely different in shape and proportions relative to other cats analysed (12 modern and two fossil felid species), including a closely-related fossil cheetah species. These distinctive attributes (i.e., one of the greatest volumes of the vestibular system, dorsal extension of the anterior and posterior semicircular canals) correlate with a greater afferent sensitivity of the inner ear to head motions, facilitating postural and visual stability during high-speed prey pursuit and capture. These features are not present in the fossil cheetah A. pardinensis, that went extinct about 126,000 years ago, demonstrating that the unique and highly specialized inner ear of the sole living species of cheetah likely evolved extremely recently, possibly later than the middle Pleistocene.
Performance of GeantV EM Physics Models
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2017-10-01
The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.
ERIC Educational Resources Information Center
DeFulio, Anthony; Crone-Todd, Darlene E.; Long, Lauren V.; Nuzzo, Paul A.; Silverman, Kenneth
2011-01-01
Keyboarding skill is an important target for adult education programs due to the ubiquity of computers in modern work environments. A previous study showed that novice typists learned key locations quickly but that fluency took a relatively long time to develop. In the present study, novice typists achieved fluent performance in nearly half the…
Refined AFC-Enabled High-Lift System Integration Study
NASA Technical Reports Server (NTRS)
Hartwich, Peter M.; Shmilovich, Arvin; Lacy, Douglas S.; Dickey, Eric D.; Scalafani, Anthony J.; Sundaram, P.; Yadlin, Yoram
2016-01-01
A prior trade study established the effectiveness of using Active Flow Control (AFC) for reducing the mechanical complexities associated with a modern high-lift system without sacrificing aerodynamic performance at low-speed flight conditions representative of takeoff and landing. The current technical report expands on this prior work in two ways: (1) a refined conventional high-lift system based on the NASA Common Research Model (CRM) is presented that is more representative of modern commercial transport aircraft in terms of stall characteristics and maximum Lift/Drag (L/D) ratios at takeoff and landing-approach flight conditions; and (2) the design trade space for AFC-enabled high-lift systems is expanded to explore a wider range of options for improving their efficiency. The refined conventional high-lift CRM (HL-CRM) concept features leading edge slats and slotted trailing edge flaps with Fowler motion. For the current AFC-enhanced high lift system trade study, the refined conventional high-lift system is simplified by substituting simply-hinged trailing edge flaps for the slotted single-element flaps with Fowler motion. The high-lift performance of these two high-lift CRM variants is established using Computational Fluid Dynamics (CFD) solutions to the Reynolds-Averaged Navier-Stokes (RANS) equations. These CFD assessments identify the high-lift performance that needs to be recovered through AFC to have the CRM variant with the lighter and mechanically simpler high-lift system match the performance of the conventional high-lift system. In parallel to the conventional high-lift concept development, parametric studies using CFD guided the development of an effective and efficient AFC-enabled simplified high-lift system. This included parametric trailing edge flap geometry studies addressing the effects of flap chord length and flap deflection. As for the AFC implementation, scaling effects (i.e., wind-tunnel versus full-scale flight conditions) are addressed, as are AFC architecture aspects such as AFC unit placement, number AFC units, operating pressures, mass flow rates, and steady versus unsteady AFC applications. These efforts led to the development of a novel traversing AFC actuation concept which is efficient in that it reduces the AFC mass flow requirements by as much as an order of magnitude compared to previous AFC technologies, and it is predicted to be effective in driving the aerodynamic performance of a mechanical simplified high-lift system close to that of the reference conventional high-lift system. Conceptual system integration studies were conducted for the AFC-enhanced high-lift concept applied to a NASA Environmentally Responsible Aircraft (ERA) reference configuration, the so-called ERA-0003 concept. The results from these design integration assessments identify overall system performance improvement opportunities over conventional high-lift systems that suggest the viability of further technology maturation efforts for AFC-enabled high lift flap systems. To that end, technical challenges are identified associated with the application of AFC-enabled high-lift systems to modern transonic commercial transports for future technology maturation efforts.
Lecture notes in economics and mathematical system. Volume 150: Supercritical wing sections 3
NASA Technical Reports Server (NTRS)
Bauer, F.; Garabedian, P.; Korn, D.
1977-01-01
Application of computational fluid dynamics to the design and analysis of supercritical wing sections is discussed. Computer programs used to study the flight of modern aircraft at high subsonic speeds are listed and described. The cascades of shockless transonic airfoils that are expected to increase the efficiency of compressors and turbines are included.
EvoGraph: On-The-Fly Efficient Mining of Evolving Graphs on GPU
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen
With the prevalence of the World Wide Web and social networks, there has been a growing interest in high performance analytics for constantly-evolving dynamic graphs. Modern GPUs provide massive AQ1 amount of parallelism for efficient graph processing, but the challenges remain due to their lack of support for the near real-time streaming nature of dynamic graphs. Specifically, due to the current high volume and velocity of graph data combined with the complexity of user queries, traditional processing methods by first storing the updates and then repeatedly running static graph analytics on a sequence of versions or snapshots are deemed undesirablemore » and computational infeasible on GPU. We present EvoGraph, a highly efficient and scalable GPU- based dynamic graph analytics framework.« less
NASA Astrophysics Data System (ADS)
Gerjuoy, Edward
2005-06-01
The security of messages encoded via the widely used RSA public key encryption system rests on the enormous computational effort required to find the prime factors of a large number N using classical (conventional) computers. In 1994 Peter Shor showed that for sufficiently large N, a quantum computer could perform the factoring with much less computational effort. This paper endeavors to explain, in a fashion comprehensible to the nonexpert, the RSA encryption protocol; the various quantum computer manipulations constituting the Shor algorithm; how the Shor algorithm performs the factoring; and the precise sense in which a quantum computer employing Shor's algorithm can be said to accomplish the factoring of very large numbers with less computational effort than a classical computer. It is made apparent that factoring N generally requires many successive runs of the algorithm. Our analysis reveals that the probability of achieving a successful factorization on a single run is about twice as large as commonly quoted in the literature.
Overview of the NASA Glenn Flux Reconstruction Based High-Order Unstructured Grid Code
NASA Technical Reports Server (NTRS)
Spiegel, Seth C.; DeBonis, James R.; Huynh, H. T.
2016-01-01
A computational fluid dynamics code based on the flux reconstruction (FR) method is currently being developed at NASA Glenn Research Center to ultimately provide a large- eddy simulation capability that is both accurate and efficient for complex aeropropulsion flows. The FR approach offers a simple and efficient method that is easy to implement and accurate to an arbitrary order on common grid cell geometries. The governing compressible Navier-Stokes equations are discretized in time using various explicit Runge-Kutta schemes, with the default being the 3-stage/3rd-order strong stability preserving scheme. The code is written in modern Fortran (i.e., Fortran 2008) and parallelization is attained through MPI for execution on distributed-memory high-performance computing systems. An h- refinement study of the isentropic Euler vortex problem is able to empirically demonstrate the capability of the FR method to achieve super-accuracy for inviscid flows. Additionally, the code is applied to the Taylor-Green vortex problem, performing numerous implicit large-eddy simulations across a range of grid resolutions and solution orders. The solution found by a pseudo-spectral code is commonly used as a reference solution to this problem, and the FR code is able to reproduce this solution using approximately the same grid resolution. Finally, an examination of the code's performance demonstrates good parallel scaling, as well as an implementation of the FR method with a computational cost/degree- of-freedom/time-step that is essentially independent of the solution order of accuracy for structured geometries.
The Development of Sociocultural Competence with the Help of Computer Technology
ERIC Educational Resources Information Center
Rakhimova, Alina E.; Yashina, Marianna E.; Mukhamadiarova, Albina F.; Sharipova, Astrid V.
2017-01-01
The article deals with the description of the process of development sociocultural knowledge and competences using computer technologies. On the whole the development of modern computer technologies allows teachers to broaden trainees' sociocultural outlook and trace their progress online. Observation of modern computer technologies and estimation…
Functional Laser Trimming Of Thin Film Resistors On Silicon ICs
NASA Astrophysics Data System (ADS)
Mueller, Michael J.; Mickanin, Wes
1986-07-01
Modern Laser Wafer Trimming (LWT) technology achieves exceptional analog circuit performance and precision while maintain-ing the advantages of high production throughput and yield. Microprocessor-driven instrumentation has both emphasized the role of data conversion circuits and demanded sophisticated signal conditioning functions. Advanced analog semiconductor circuits with bandwidths over 1 GHz, and high precision, trimmable, thin-film resistors meet many of todays emerging circuit requirements. Critical to meeting these requirements are optimum choices of laser characteristics, proper materials, trimming process control, accurate modeling of trimmed resistor performance, and appropriate circuit design. Once limited exclusively to hand-crafted, custom integrated circuits, designs are now available in semi-custom circuit configurations. These are similar to those provided for digital designs and supported by computer-aided design (CAD) tools. Integrated with fully automated measurement and trimming systems, these quality circuits can now be produced in quantity to meet the requirements of communications, instrumentation, and signal processing markets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCarthy, J.M.
The theory and methodology of design of general-purpose machines that may be controlled by a computer to perform all the tasks of a set of special-purpose machines is the focus of modern machine design research. These seventeen contributions chronicle recent activity in the analysis and design of robot manipulators that are the prototype of these general-purpose machines. They focus particularly on kinematics, the geometry of rigid-body motion, which is an integral part of machine design theory. The challenges to kinematics researchers presented by general-purpose machines such as the manipulator are leading to new perspectives in the design and control ofmore » simpler machines with two, three, and more degrees of freedom. Researchers are rethinking the uses of gear trains, planar mechanisms, adjustable mechanisms, and computer controlled actuators in the design of modern machines.« less
Comparing an FPGA to a Cell for an Image Processing Application
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.
2010-12-01
Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses
NASA Astrophysics Data System (ADS)
Lin, Yu-Pu; Bennett, Christopher H.; Cabaret, Théo; Vodenicarevic, Damir; Chabi, Djaafar; Querlioz, Damien; Jousselme, Bruno; Derycke, Vincent; Klein, Jacques-Olivier
2016-09-01
Multiple modern applications of electronics call for inexpensive chips that can perform complex operations on natural data with limited energy. A vision for accomplishing this is implementing hardware neural networks, which fuse computation and memory, with low cost organic electronics. A challenge, however, is the implementation of synapses (analog memories) composed of such materials. In this work, we introduce robust, fastly programmable, nonvolatile organic memristive nanodevices based on electrografted redox complexes that implement synapses thanks to a wide range of accessible intermediate conductivity states. We demonstrate experimentally an elementary neural network, capable of learning functions, which combines four pairs of organic memristors as synapses and conventional electronics as neurons. Our architecture is highly resilient to issues caused by imperfect devices. It tolerates inter-device variability and an adaptable learning rule offers immunity against asymmetries in device switching. Highly compliant with conventional fabrication processes, the system can be extended to larger computing systems capable of complex cognitive tasks, as demonstrated in complementary simulations.
Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses.
Lin, Yu-Pu; Bennett, Christopher H; Cabaret, Théo; Vodenicarevic, Damir; Chabi, Djaafar; Querlioz, Damien; Jousselme, Bruno; Derycke, Vincent; Klein, Jacques-Olivier
2016-09-07
Multiple modern applications of electronics call for inexpensive chips that can perform complex operations on natural data with limited energy. A vision for accomplishing this is implementing hardware neural networks, which fuse computation and memory, with low cost organic electronics. A challenge, however, is the implementation of synapses (analog memories) composed of such materials. In this work, we introduce robust, fastly programmable, nonvolatile organic memristive nanodevices based on electrografted redox complexes that implement synapses thanks to a wide range of accessible intermediate conductivity states. We demonstrate experimentally an elementary neural network, capable of learning functions, which combines four pairs of organic memristors as synapses and conventional electronics as neurons. Our architecture is highly resilient to issues caused by imperfect devices. It tolerates inter-device variability and an adaptable learning rule offers immunity against asymmetries in device switching. Highly compliant with conventional fabrication processes, the system can be extended to larger computing systems capable of complex cognitive tasks, as demonstrated in complementary simulations.
NASA Astrophysics Data System (ADS)
Roussel, Marc R.
1999-10-01
One of the traditional obstacles to learning quantum mechanics is the relatively high level of mathematical proficiency required to solve even routine problems. Modern computer algebra systems are now sufficiently reliable that they can be used as mathematical assistants to alleviate this difficulty. In the quantum mechanics course at the University of Lethbridge, the traditional three lecture hours per week have been replaced by two lecture hours and a one-hour computer-aided problem solving session using a computer algebra system (Maple). While this somewhat reduces the number of topics that can be tackled during the term, students have a better opportunity to familiarize themselves with the underlying theory with this course design. Maple is also available to students during examinations. The use of a computer algebra system expands the class of feasible problems during a time-limited exercise such as a midterm or final examination. A modern computer algebra system is a complex piece of software, so some time needs to be devoted to teaching the students its proper use. However, the advantages to the teaching of quantum mechanics appear to outweigh the disadvantages.
An S N Algorithm for Modern Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Randal Scott
2016-08-29
LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Analyzing large-scale spiking neural data with HRLAnalysis™
Thibeault, Corey M.; O'Brien, Michael J.; Srinivasa, Narayan
2014-01-01
The additional capabilities provided by high-performance neural simulation environments and modern computing hardware has allowed for the modeling of increasingly larger spiking neural networks. This is important for exploring more anatomically detailed networks but the corresponding accumulation in data can make analyzing the results of these simulations difficult. This is further compounded by the fact that many existing analysis packages were not developed with large spiking data sets in mind. Presented here is a software suite developed to not only process the increased amount of spike-train data in a reasonable amount of time, but also provide a user friendly Python interface. We describe the design considerations, implementation and features of the HRLAnalysis™ suite. In addition, performance benchmarks demonstrating the speedup of this design compared to a published Python implementation are also presented. The result is a high-performance analysis toolkit that is not only usable and readily extensible, but also straightforward to interface with existing Python modules. PMID:24634655
Manycore Performance-Portability: Kokkos Multidimensional Array Library
Edwards, H. Carter; Sunderland, Daniel; Porter, Vicki; ...
2012-01-01
Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel executionmore » performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].« less
High-Performance Computing and Visualization | Energy Systems Integration
Facility | NREL High-Performance Computing and Visualization High-Performance Computing and Visualization High-performance computing (HPC) and visualization at NREL propel technology innovation as a . Capabilities High-Performance Computing NREL is home to Peregrine-the largest high-performance computing system
Almeida, Jonas S.; Iriabho, Egiebade E.; Gorrepati, Vijaya L.; Wilkinson, Sean R.; Grüneberg, Alexander; Robbins, David E.; Hackney, James R.
2012-01-01
Background: Image bioinformatics infrastructure typically relies on a combination of server-side high-performance computing and client desktop applications tailored for graphic rendering. On the server side, matrix manipulation environments are often used as the back-end where deployment of specialized analytical workflows takes place. However, neither the server-side nor the client-side desktop solution, by themselves or combined, is conducive to the emergence of open, collaborative, computational ecosystems for image analysis that are both self-sustained and user driven. Materials and Methods: ImageJS was developed as a browser-based webApp, untethered from a server-side backend, by making use of recent advances in the modern web browser such as a very efficient compiler, high-end graphical rendering capabilities, and I/O tailored for code migration. Results: Multiple versioned code hosting services were used to develop distinct ImageJS modules to illustrate its amenability to collaborative deployment without compromise of reproducibility or provenance. The illustrative examples include modules for image segmentation, feature extraction, and filtering. The deployment of image analysis by code migration is in sharp contrast with the more conventional, heavier, and less safe reliance on data transfer. Accordingly, code and data are loaded into the browser by exactly the same script tag loading mechanism, which offers a number of interesting applications that would be hard to attain with more conventional platforms, such as NIH's popular ImageJ application. Conclusions: The modern web browser was found to be advantageous for image bioinformatics in both the research and clinical environments. This conclusion reflects advantages in deployment scalability and analysis reproducibility, as well as the critical ability to deliver advanced computational statistical procedures machines where access to sensitive data is controlled, that is, without local “download and installation”. PMID:22934238
NASA Astrophysics Data System (ADS)
Ford, Eric B.
2009-05-01
We present the results of a highly parallel Kepler equation solver using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX and the "Compute Unified Device Architecture" (CUDA) programming environment. We apply this to evaluate a goodness-of-fit statistic (e.g., χ2) for Doppler observations of stars potentially harboring multiple planetary companions (assuming negligible planet-planet interactions). Given the high-dimensionality of the model parameter space (at least five dimensions per planet), a global search is extremely computationally demanding. We expect that the underlying Kepler solver and model evaluator will be combined with a wide variety of more sophisticated algorithms to provide efficient global search, parameter estimation, model comparison, and adaptive experimental design for radial velocity and/or astrometric planet searches. We tested multiple implementations using single precision, double precision, pairs of single precision, and mixed precision arithmetic. We find that the vast majority of computations can be performed using single precision arithmetic, with selective use of compensated summation for increased precision. However, standard single precision is not adequate for calculating the mean anomaly from the time of observation and orbital period when evaluating the goodness-of-fit for real planetary systems and observational data sets. Using all double precision, our GPU code outperforms a similar code using a modern CPU by a factor of over 60. Using mixed precision, our GPU code provides a speed-up factor of over 600, when evaluating nsys > 1024 models planetary systems each containing npl = 4 planets and assuming nobs = 256 observations of each system. We conclude that modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's equation and a goodness-of-fit statistic for orbital models when presented with a large parameter space.
Almeida, Jonas S; Iriabho, Egiebade E; Gorrepati, Vijaya L; Wilkinson, Sean R; Grüneberg, Alexander; Robbins, David E; Hackney, James R
2012-01-01
Image bioinformatics infrastructure typically relies on a combination of server-side high-performance computing and client desktop applications tailored for graphic rendering. On the server side, matrix manipulation environments are often used as the back-end where deployment of specialized analytical workflows takes place. However, neither the server-side nor the client-side desktop solution, by themselves or combined, is conducive to the emergence of open, collaborative, computational ecosystems for image analysis that are both self-sustained and user driven. ImageJS was developed as a browser-based webApp, untethered from a server-side backend, by making use of recent advances in the modern web browser such as a very efficient compiler, high-end graphical rendering capabilities, and I/O tailored for code migration. Multiple versioned code hosting services were used to develop distinct ImageJS modules to illustrate its amenability to collaborative deployment without compromise of reproducibility or provenance. The illustrative examples include modules for image segmentation, feature extraction, and filtering. The deployment of image analysis by code migration is in sharp contrast with the more conventional, heavier, and less safe reliance on data transfer. Accordingly, code and data are loaded into the browser by exactly the same script tag loading mechanism, which offers a number of interesting applications that would be hard to attain with more conventional platforms, such as NIH's popular ImageJ application. The modern web browser was found to be advantageous for image bioinformatics in both the research and clinical environments. This conclusion reflects advantages in deployment scalability and analysis reproducibility, as well as the critical ability to deliver advanced computational statistical procedures machines where access to sensitive data is controlled, that is, without local "download and installation".
Schwenke, Michael; Georgii, Joachim; Preusser, Tobias
2017-07-01
Focused ultrasound (FUS) is rapidly gaining clinical acceptance for several target tissues in the human body. Yet, treating liver targets is not clinically applied due to a high complexity of the procedure (noninvasiveness, target motion, complex anatomy, blood cooling effects, shielding by ribs, and limited image-based monitoring). To reduce the complexity, numerical FUS simulations can be utilized for both treatment planning and execution. These use-cases demand highly accurate and computationally efficient simulations. We propose a numerical method for the simulation of abdominal FUS treatments during respiratory motion of the organs and target. Especially, a novel approach is proposed to simulate the heating during motion by solving Pennes' bioheat equation in a computational reference space, i.e., the equation is mathematically transformed to the reference. The approach allows for motion discontinuities, e.g., the sliding of the liver along the abdominal wall. Implementing the solver completely on the graphics processing unit and combining it with an atlas-based ultrasound simulation approach yields a simulation performance faster than real time (less than 50-s computing time for 100 s of treatment time) on a modern off-the-shelf laptop. The simulation method is incorporated into a treatment planning demonstration application that allows to simulate real patient cases including respiratory motion. The high performance of the presented simulation method opens the door to clinical applications. The methods bear the potential to enable the application of FUS for moving organs.
ERIC Educational Resources Information Center
Berg, A. I.; And Others
Five articles which were selected from a Russian language book on cybernetics and then translated are presented here. They deal with the topics of: computer-developed computers, heuristics and modern sciences, linguistics and practice, cybernetics and moral-ethical considerations, and computer chess programs. (Author/JY)
Raster-Based Approach to Solar Pressure Modeling
NASA Technical Reports Server (NTRS)
Wright, Theodore W. II
2013-01-01
An algorithm has been developed to take advantage of the graphics processing hardware in modern computers to efficiently compute high-fidelity solar pressure forces and torques on spacecraft, taking into account the possibility of self-shading due to the articulation of spacecraft components such as solar arrays. The process is easily extended to compute other results that depend on three-dimensional attitude analysis, such as solar array power generation or free molecular flow drag. The impact of photons upon a spacecraft introduces small forces and moments. The magnitude and direction of the forces depend on the material properties of the spacecraft components being illuminated. The parts of the components being lit depends on the orientation of the craft with respect to the Sun, as well as the gimbal angles for any significant moving external parts (solar arrays, typically). Some components may shield others from the Sun. The purpose of this innovation is to enable high-fidelity computation of solar pressure and power generation effects of illuminated portions of spacecraft, taking self-shading from spacecraft attitude and movable components into account. The key idea in this innovation is to compute results dependent upon complicated geometry by using an image to break the problem into thousands or millions of sub-problems with simple geometry, and then the results from the simpler problems are combined to give high-fidelity results for the full geometry. This process is performed by constructing a 3D model of a spacecraft using an appropriate computer language (OpenGL), and running that model on a modern computer's 3D accelerated video processor. This quickly and accurately generates a view of the model (as shown on a computer screen) that takes rotation and articulation of spacecraft components into account. When this view is interpreted as the spacecraft as seen by the Sun, then only the portions of the craft visible in the view are illuminated. The view as shown on the computer screen is composed of up to millions of pixels. Each of those pixels is associated with a small illuminated area of the spacecraft. For each pixel, it is possible to compute its position, angle (surface normal) from the view direction, and the spacecraft material (and therefore, optical coefficients) associated with that area. With this information, the area associated with each pixel can be modeled as a simple flat plate for calculating solar pressure. The vector sum of these individual flat plate models is a high-fidelity approximation of the solar pressure forces and torques on the whole vehicle. In addition to using optical coefficients associated with each spacecraft material to calculate solar pressure, a power generation coefficient is added for computing solar array power generation from the sum of the illuminated areas. Similarly, other area-based calculations, such as free molecular flow drag, are also enabled. Because the model rendering is separated from other calculations, it is relatively easy to add a new model to explore a new vehicle or mission configuration. Adding a new model is performed by adding OpenGL code, but a future version might read a mesh file exported from a computer-aided design (CAD) system to enable very rapid turnaround for new designs
High-Performance Design Patterns for Modern Fortran
Haveraaen, Magne; Morris, Karla; Rouson, Damian; ...
2015-01-01
This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran’s new coarray distributed data structure, the language’s class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented thesemore » patterns using coarrays and the message passing interface (MPI). We compared the codes’ complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5–2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel’s MPI library with the result apparently being 2–2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.« less
The Spectral Element Method for Geophysical Flows
NASA Astrophysics Data System (ADS)
Taylor, Mark
1998-11-01
We will describe SEAM, a Spectral Element Atmospheric Model. SEAM solves the 3D primitive equations used in climate modeling and medium range forecasting. SEAM uses a spectral element discretization for the surface of the globe and finite differences in the vertical direction. The model is spectrally accurate, as demonstrated by a variety of test cases. It is well suited for modern distributed-shared memory computers, sustaining over 24 GFLOPS on a 240 processor HP Exemplar. This performance has allowed us to run several interesting simulations in full spherical geometry at high resolution (over 22 million grid points).
Linear solver performance in elastoplastic problem solution on GPU cluster
NASA Astrophysics Data System (ADS)
Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.
2017-12-01
Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Atropos: specific, sensitive, and speedy trimming of sequencing reads.
Didion, John P; Martin, Marcel; Collins, Francis S
2017-01-01
A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
Atropos: specific, sensitive, and speedy trimming of sequencing reads
Collins, Francis S.
2017-01-01
A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos. PMID:28875074
ERIC Educational Resources Information Center
Ivanov, Anisoara; Neacsu, Andrei
2011-01-01
This study describes the possibility and advantages of utilizing simple computer codes to complement the teaching techniques for high school physics. The authors have begun working on a collection of open source programs which allow students to compare the results and graphics from classroom exercises with the correct solutions and further more to…
Performance comparison of plastic shopping bags in modern and traditional retail
NASA Astrophysics Data System (ADS)
Radini, F. A.; Wulandari, R.; Nasiri, S. J. A.; Winarto, D. A.
2017-07-01
Followed by implementation of paid plastic bag policy in Indonesia’s modern and traditional retail, community question related to plastic shopping bag performance arise. But, there is limited information about it. Therefore, the assessment of the performance to compare between plastic shopping bags in modern retail and traditional retail should be interesting. The observation performance of plastic shopping bag were weight holding capacity, tear resistant and elongation. This performance were tested using Universal Testing Machine. Physical and physico-chemical properties also identified to determine factor affecting the performance of plastic shopping bag. The physical properties were analysed using visual and thickness gauge to see the colour and measure the thickness. The analysis of physico-chemical properties were carried out using DSC (Differential Scanning Calorimetry), TGA (Thermal Gravimetry Analysis), Furnace and FTIR (Fourier Transform Infra Red Spectroscopy) to identify the materials, also its melting and decomposition temperature. The result showed that the performance difference between modern retail plastic bag with traditional retail plastic bag appears only in the performance of elongation. The elongation of modern retail plastic bag is 121 - 413%, while traditional has 170 - 609%. According to physico-chemical test result, modern retail and traditional retail plastic bag contain polyethylene as main material and has melting temperature in the range of High Density Polyethylene (HDPE) melting temperature. However, modern retail plastic bag has 18.31 - 33.87% of inorganic filler percentage, whereas the traditional retail plastic bag has 0.35 - 9.85%. This inorganic filler percentage probably a contributing factor in the elongation performance difference between modern retail plastic bag with traditional retail plastic bag.
On the Edge: Intelligent CALL in the 1990s.
ERIC Educational Resources Information Center
Underwood, John
1989-01-01
Examines the possibilities of developing computer-assisted language learning (CALL) based on the best of modern technology, arguing that artificial intelligence (AI) strategies will radically improve the kinds of exercises that can be performed. Recommends combining AI technology with other tools for delivering instruction, such as simulation and…
Developing and Assessing E-Learning Techniques for Teaching Forecasting
ERIC Educational Resources Information Center
Gel, Yulia R.; O'Hara Hines, R. Jeanette; Chen, He; Noguchi, Kimihiro; Schoner, Vivian
2014-01-01
In the modern business environment, managers are increasingly required to perform decision making and evaluate related risks based on quantitative information in the face of uncertainty, which in turn increases demand for business professionals with sound skills and hands-on experience with statistical data analysis. Computer-based training…
Measurement of flows around modern commercial ship models
NASA Astrophysics Data System (ADS)
Kim, W. J.; Van, S. H.; Kim, D. H.
To document the details of flow characteristics around modern commercial ships, global force, wave pattern, and local mean velocity components were measured in the towing tank. Three modern commercial hull models of a container ship (KRISO container ship = KCS) and of two very large crude-oil carriers (VLCCs) with the same forebody and slightly different afterbody (KVLCC and KVLCC2) having bow and stern bulbs were selected for the test. Uncertainty analysis was performed for the measured data using the procedure recommended by the ITTC. Obtained experimental data will provide a good opportunity to explore integrated flow phenomena around practical hull forms of today. Those can be also used as the validation data for the computational fluid dynamics (CFD) code of both inviscid and viscous flow calculations.
Computational Chemistry Using Modern Electronic Structure Methods
ERIC Educational Resources Information Center
Bell, Stephen; Dines, Trevor J.; Chowdhry, Babur Z.; Withnall, Robert
2007-01-01
Various modern electronic structure methods are now days used to teach computational chemistry to undergraduate students. Such quantum calculations can now be easily used even for large size molecules.
Microelectromechanical reprogrammable logic device.
Hafiz, M A A; Kosuru, L; Younis, M I
2016-03-29
In modern computing, the Boolean logic operations are set by interconnect schemes between the transistors. As the miniaturization in the component level to enhance the computational power is rapidly approaching physical limits, alternative computing methods are vigorously pursued. One of the desired aspects in the future computing approaches is the provision for hardware reconfigurability at run time to allow enhanced functionality. Here we demonstrate a reprogrammable logic device based on the electrothermal frequency modulation scheme of a single microelectromechanical resonator, capable of performing all the fundamental 2-bit logic functions as well as n-bit logic operations. Logic functions are performed by actively tuning the linear resonance frequency of the resonator operated at room temperature and under modest vacuum conditions, reprogrammable by the a.c.-driving frequency. The device is fabricated using complementary metal oxide semiconductor compatible mass fabrication process, suitable for on-chip integration, and promises an alternative electromechanical computing scheme.
Microelectromechanical reprogrammable logic device
Hafiz, M. A. A.; Kosuru, L.; Younis, M. I.
2016-01-01
In modern computing, the Boolean logic operations are set by interconnect schemes between the transistors. As the miniaturization in the component level to enhance the computational power is rapidly approaching physical limits, alternative computing methods are vigorously pursued. One of the desired aspects in the future computing approaches is the provision for hardware reconfigurability at run time to allow enhanced functionality. Here we demonstrate a reprogrammable logic device based on the electrothermal frequency modulation scheme of a single microelectromechanical resonator, capable of performing all the fundamental 2-bit logic functions as well as n-bit logic operations. Logic functions are performed by actively tuning the linear resonance frequency of the resonator operated at room temperature and under modest vacuum conditions, reprogrammable by the a.c.-driving frequency. The device is fabricated using complementary metal oxide semiconductor compatible mass fabrication process, suitable for on-chip integration, and promises an alternative electromechanical computing scheme. PMID:27021295
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
clubber: removing the bioinformatics bottleneck in big data analyses.
Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana
2017-06-13
With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these "big data" analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber's goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment.
clubber: removing the bioinformatics bottleneck in big data analyses
Miller, Maximilian; Zhu, Chengsheng; Bromberg, Yana
2018-01-01
With the advent of modern day high-throughput technologies, the bottleneck in biological discovery has shifted from the cost of doing experiments to that of analyzing results. clubber is our automated cluster-load balancing system developed for optimizing these “big data” analyses. Its plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. clubber’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API, allowing for job monitoring and result retrieval. We used clubber to speed up our pipeline for annotating molecular functionality of metagenomes. Here, we analyzed the Deepwater Horizon oil-spill study data to quantitatively show that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 min) clearly illustrate the importance of clubber in the everyday computational biology environment. PMID:28609295
A Parallel Numerical Micromagnetic Code Using FEniCS
NASA Astrophysics Data System (ADS)
Nagy, L.; Williams, W.; Mitchell, L.
2013-12-01
Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.
NASA Astrophysics Data System (ADS)
Brodyn, M. S.; Starkov, V. N.
2007-07-01
It is shown that in laser experiments performed by using an 'imperfect' setup when instrumental distortions are considerable, sufficiently accurate results can be obtained by the modern methods of computational physics. It is found for the first time that a new instrumental function — the 'cap' function — a 'sister' of a Gaussian curve proved to be demanded namely in laser experiments. A new mathematical model of a measurement path and carefully performed computational experiment show that a light beam transmitted through a mesoporous film has actually a narrower intensity distribution than the detected beam, and the amplitude of the real intensity distribution is twice as large as that for measured intensity distributions.
Fast neural net simulation with a DSP processor array.
Muller, U A; Gunzinger, A; Guggenbuhl, W
1995-01-01
This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Computer-assisted learning in critical care: from ENIAC to HAL.
Tegtmeyer, K; Ibsen, L; Goldstein, B
2001-08-01
Computers are commonly used to serve many functions in today's modern intensive care unit. One of the most intriguing and perhaps most challenging applications of computers has been to attempt to improve medical education. With the introduction of the first computer, medical educators began looking for ways to incorporate their use into the modern curriculum. Prior limitations of cost and complexity of computers have consistently decreased since their introduction, making it increasingly feasible to incorporate computers into medical education. Simultaneously, the capabilities and capacities of computers have increased. Combining the computer with other modern digital technology has allowed the development of more intricate and realistic educational tools. The purpose of this article is to briefly describe the history and use of computers in medical education with special reference to critical care medicine. In addition, we will examine the role of computers in teaching and learning and discuss the types of interaction between the computer user and the computer.
Extending a Flight Management Computer for Simulation and Flight Experiments
NASA Technical Reports Server (NTRS)
Madden, Michael M.; Sugden, Paul C.
2005-01-01
In modern transport aircraft, the flight management computer (FMC) has evolved from a flight planning aid to an important hub for pilot information and origin-to-destination optimization of flight performance. Current trends indicate increasing roles of the FMC in aviation safety, aviation security, increasing airport capacity, and improving environmental impact from aircraft. Related research conducted at the Langley Research Center (LaRC) often requires functional extension of a modern, full-featured FMC. Ideally, transport simulations would include an FMC simulation that could be tailored and extended for experiments. However, due to the complexity of a modern FMC, a large investment (millions of dollars over several years) and scarce domain knowledge are needed to create such a simulation for transport aircraft. As an intermediate alternative, the Flight Research Services Directorate (FRSD) at LaRC created a set of reusable software products to extend flight management functionality upstream of a Boeing-757 FMC, transparently simulating or sharing its operator interfaces. The paper details the design of these products and highlights their use on NASA projects.
Computer Synthesis Approaches of Hyperboloid Gear Drives with Linear Contact
NASA Astrophysics Data System (ADS)
Abadjiev, Valentin; Kawasaki, Haruhisa
2014-09-01
The computer design has improved forming different type software for scientific researches in the field of gearing theory as well as performing an adequate scientific support of the gear drives manufacture. Here are attached computer programs that are based on mathematical models as a result of scientific researches. The modern gear transmissions require the construction of new mathematical approaches to their geometric, technological and strength analysis. The process of optimization, synthesis and design is based on adequate iteration procedures to find out an optimal solution by varying definite parameters. The study is dedicated to accepted methodology in the creation of soft- ware for the synthesis of a class high reduction hyperboloid gears - Spiroid and Helicon ones (Spiroid and Helicon are trademarks registered by the Illinois Tool Works, Chicago, Ill). The developed basic computer products belong to software, based on original mathematical models. They are based on the two mathematical models for the synthesis: "upon a pitch contact point" and "upon a mesh region". Computer programs are worked out on the basis of the described mathematical models, and the relations between them are shown. The application of the shown approaches to the synthesis of commented gear drives is illustrated.
Bañares, Miguel A; Haase, Andrea; Tran, Lang; Lobaskin, Vladimir; Oberdörster, Günter; Rallo, Robert; Leszczynski, Jerzy; Hoet, Peter; Korenstein, Rafi; Hardy, Barry; Puzyn, Tomasz
2017-09-01
A first European Conference on Computational Nanotoxicology, CompNanoTox, was held in November 2015 in Benahavís, Spain with the objectives to disseminate and integrate results from the European modeling and database projects (NanoPUZZLES, ModENPTox, PreNanoTox, MembraneNanoPart, MODERN, eNanoMapper and EU COST TD1204 MODENA) as well as to create synergies within the European NanoSafety Cluster. This conference was supported by the COST Action TD1204 MODENA on developing computational methods for toxicological risk assessment of engineered nanoparticles and provided a unique opportunity for cross fertilization among complementary disciplines. The efforts to develop and validate computational models crucially depend on high quality experimental data and relevant assays which will be the basis to identify relevant descriptors. The ambitious overarching goal of this conference was to promote predictive nanotoxicology, which can only be achieved by a close collaboration between the computational scientists (e.g. database experts, modeling experts for structure, (eco) toxicological effects, performance and interaction of nanomaterials) and experimentalists from different areas (in particular toxicologists, biologists, chemists and material scientists, among others). The main outcome and new perspectives of this conference are summarized here.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bañares, Miguel A.; Haase, Andrea; Tran, Lang
A first European Conference on Computational Nanotoxicology, CompNanoTox, was held in November 2015 in Benahavís, Spain with the objectives to disseminate and integrate results from the European modeling and database projects (NanoPUZZLES, ModENPTox, PreNanoTox, MembraneNanoPart, MODERN, eNanoMapper and EU COST TD1204 MODENA) as well as to create synergies within the European NanoSafety Cluster. This conference was supported by the COST Action TD1204 MODENA on developing computational methods for toxicological risk assessment of engineered nanoparticles and provided a unique opportunity for crossfertilization among complementary disciplines. The efforts to develop and validate computational models crucially depend on high quality experimental data andmore » relevant assays which will be the basis to identify relevant descriptors. The ambitious overarching goal of this conference was to promote predictive nanotoxicology, which can only be achieved by a close collaboration between the computational scientists (e.g. database experts, modeling experts for structure, (eco) toxicological effects, performance and interaction of nanomaterials) and experimentalists from different areas (in particular toxicologists, biologists, chemists and material scientists, among others). The main outcome and new perspectives of this conference are summarized here.« less
Cormode, Graham; Dasgupta, Anirban; Goyal, Amit; Lee, Chi Hoon
2018-01-01
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with "vanilla" LSH, even when using the same amount of space.
Modeling and Simulation of High Dimensional Stochastic Multiscale PDE Systems at the Exascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kevrekidis, Ioannis
2017-03-22
The thrust of the proposal was to exploit modern data-mining tools in a way that will create a systematic, computer-assisted approach to the representation of random media -- and also to the representation of the solutions of an array of important physicochemical processes that take place in/on such media. A parsimonious representation/parametrization of the random media links directly (via uncertainty quantification tools) to good sampling of the distribution of random media realizations. It also links directly to modern multiscale computational algorithms (like the equation-free approach that has been developed in our group) and plays a crucial role in accelerating themore » scientific computation of solutions of nonlinear PDE models (deterministic or stochastic) in such media – both solutions in particular realizations of the random media, and estimation of the statistics of the solutions over multiple realizations (e.g. expectations).« less
Execution environment for intelligent real-time control systems
NASA Technical Reports Server (NTRS)
Sztipanovits, Janos
1987-01-01
Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.
A study of workstation computational performance for real-time flight simulation
NASA Technical Reports Server (NTRS)
Maddalon, Jeffrey M.; Cleveland, Jeff I., II
1995-01-01
With recent advances in microprocessor technology, some have suggested that modern workstations provide enough computational power to properly operate a real-time simulation. This paper presents the results of a computational benchmark, based on actual real-time flight simulation code used at Langley Research Center, which was executed on various workstation-class machines. The benchmark was executed on different machines from several companies including: CONVEX Computer Corporation, Cray Research, Digital Equipment Corporation, Hewlett-Packard, Intel, International Business Machines, Silicon Graphics, and Sun Microsystems. The machines are compared by their execution speed, computational accuracy, and porting effort. The results of this study show that the raw computational power needed for real-time simulation is now offered by workstations.
ECG R-R peak detection on mobile phones.
Sufi, F; Fang, Q; Cosic, I
2007-01-01
Mobile phones have become an integral part of modern life. Due to the ever increasing processing power, mobile phones are rapidly expanding its arena from a sole device of telecommunication to organizer, calculator, gaming device, web browser, music player, audio/video recording device, navigator etc. The processing power of modern mobile phones has been utilized by many innovative purposes. In this paper, we are proposing the utilization of mobile phones for monitoring and analysis of biosignal. The computation performed inside the mobile phone's processor will now be exploited for healthcare delivery. We performed literature review on RR interval detection from ECG and selected few PC based algorithms. Then, three of those existing RR interval detection algorithms were programmed on Java platform. Performance monitoring and comparison studies were carried out on three different mobile devices to determine their application on a realtime telemonitoring scenario.
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; ...
2017-01-31
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly availablemore » data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds.« less
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; Liu, Miao; Winston, Donald; Chen, Wei; Graf, Tanja; Schladt, Thomas D.; Persson, Kristin A.; Prinz, Fritz B.
2017-01-01
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly available data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds. PMID:28140408
Computational fluid dynamics: Transition to design applications
NASA Technical Reports Server (NTRS)
Bradley, R. G.; Bhateley, I. C.; Howell, G. A.
1987-01-01
The development of aerospace vehicles, over the years, was an evolutionary process in which engineering progress in the aerospace community was based, generally, on prior experience and data bases obtained through wind tunnel and flight testing. Advances in the fundamental understanding of flow physics, wind tunnel and flight test capability, and mathematical insights into the governing flow equations were translated into improved air vehicle design. The modern day field of Computational Fluid Dynamics (CFD) is a continuation of the growth in analytical capability and the digital mathematics needed to solve the more rigorous form of the flow equations. Some of the technical and managerial challenges that result from rapidly developing CFD capabilites, some of the steps being taken by the Fort Worth Division of General Dynamics to meet these challenges, and some of the specific areas of application for high performance air vehicles are presented.
Petousis, Ioannis; Mrdjenovich, David; Ballouz, Eric; Liu, Miao; Winston, Donald; Chen, Wei; Graf, Tanja; Schladt, Thomas D; Persson, Kristin A; Prinz, Fritz B
2017-01-31
Dielectrics are an important class of materials that are ubiquitous in modern electronic applications. Even though their properties are important for the performance of devices, the number of compounds with known dielectric constant is on the order of a few hundred. Here, we use Density Functional Perturbation Theory as a way to screen for the dielectric constant and refractive index of materials in a fast and computationally efficient way. Our results constitute the largest dielectric tensors database to date, containing 1,056 compounds. Details regarding the computational methodology and technical validation are presented along with the format of our publicly available data. In addition, we integrate our dataset with the Materials Project allowing users easy access to material properties. Finally, we explain how our dataset and calculation methodology can be used in the search for novel dielectric compounds.
Experimental validation of docking and capture using space robotics testbeds
NASA Technical Reports Server (NTRS)
Spofford, John; Schmitz, Eric; Hoff, William
1991-01-01
This presentation describes the application of robotic and computer vision systems to validate docking and capture operations for space cargo transfer vehicles. Three applications are discussed: (1) air bearing systems in two dimensions that yield high quality free-flying, flexible, and contact dynamics; (2) validation of docking mechanisms with misalignment and target dynamics; and (3) computer vision technology for target location and real-time tracking. All the testbeds are supported by a network of engineering workstations for dynamic and controls analyses. Dynamic simulation of multibody rigid and elastic systems are performed with the TREETOPS code. MATRIXx/System-Build and PRO-MATLAB/Simulab are the tools for control design and analysis using classical and modern techniques such as H-infinity and LQG/LTR. SANDY is a general design tool to optimize numerically a multivariable robust compensator with a user-defined structure. Mathematica and Macsyma are used to derive symbolically dynamic and kinematic equations.
The markup is the model: reasoning about systems biology models in the Semantic Web era.
Kell, Douglas B; Mendes, Pedro
2008-06-07
Metabolic control analysis, co-invented by Reinhart Heinrich, is a formalism for the analysis of biochemical networks, and is a highly important intellectual forerunner of modern systems biology. Exchanging ideas and exchanging models are part of the international activities of science and scientists, and the Systems Biology Markup Language (SBML) allows one to perform the latter with great facility. Encoding such models in SBML allows their distributed analysis using loosely coupled workflows, and with the advent of the Internet the various software modules that one might use to analyze biochemical models can reside on entirely different computers and even on different continents. Optimization is at the core of many scientific and biotechnological activities, and Reinhart made many major contributions in this area, stimulating our own activities in the use of the methods of evolutionary computing for optimization.
A Survey Of Techniques for Managing and Leveraging Caches in GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
2014-09-01
Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges ofmore » cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.« less
Alpha Control - A new Concept in SPM Control
NASA Astrophysics Data System (ADS)
Spizig, P.; Sanchen, D.; Volswinkler, G.; Ibach, W.; Koenen, J.
2006-03-01
Controlling modern Scanning Probe Microscopes demands highly sophisticated electronics. While flexibility and powerful computing power is of great importance in facilitating the variety of measurement modes, extremely low noise is also a necessity. Accordingly, modern SPM Controller designs are based on digital electronics to overcome the drawbacks of analog designs. While todays SPM controllers are based on DSPs or Microprocessors and often still incorporate analog parts, we are now introducing a completely new approach: Using a Field Programmable Gate Array (FPGA) to implement the digital control tasks allows unrivalled data processing speed by computing all tasks in parallel within a single chip. Time consuming task switching between data acquisition, digital filtering, scanning and the computing of feedback signals can be completely avoided. Together with a star topology to avoid any bus limitations in accessing the variety of ADCs and DACs, this design guarantees for the first time an entirely deterministic timing capability in the nanosecond regime for all tasks. This becomes especially useful for any external experiments which must be synchronized with the scan or for high speed scans that require not only closed loop control of the scanner, but also dynamic correction of the scan movement. Delicate samples additionally benefit from extremely high sample rates, allowing highly resolved signals and low noise levels.
NASA Technical Reports Server (NTRS)
Gillian, Ronnie E.; Lotts, Christine G.
1988-01-01
The Computational Structural Mechanics (CSM) Activity at Langley Research Center is developing methods for structural analysis on modern computers. To facilitate that research effort, an applications development environment has been constructed to insulate the researcher from the many computer operating systems of a widely distributed computer network. The CSM Testbed development system was ported to the Numerical Aerodynamic Simulator (NAS) Cray-2, at the Ames Research Center, to provide a high end computational capability. This paper describes the implementation experiences, the resulting capability, and the future directions for the Testbed on supercomputers.
Toward a reaction rate model of condensed-phase RDX decomposition under high temperatures
NASA Astrophysics Data System (ADS)
Schweigert, Igor
2014-03-01
Shock ignition of energetic molecular solids is driven by microstructural heterogeneities, at which even moderate stresses can result in sufficiently high temperatures to initiate material decomposition and the release of the chemical energy. Mesoscale modeling of these ``hot spots'' requires a chemical reaction rate model that describes the energy release with a sub-microsecond resolution and under a wide range of temperatures. No such model is available even for well-studied energetic materials such as RDX. In this presentation, I will describe an ongoing effort to develop a reaction rate model of condensed-phase RDX decomposition under high temperatures using first-principles molecular dynamics, transition-state theory, and reaction network analysis. This work was supported by the Naval Research Laboratory, by the Office of Naval Research, and by the DOD High Performance Computing Modernization Program Software Application Institute for Multiscale Reactive Modeling of Insensitive Munitions.
Toward a reaction rate model of condensed-phase RDX decomposition under high temperatures
NASA Astrophysics Data System (ADS)
Schweigert, Igor
2015-06-01
Shock ignition of energetic molecular solids is driven by microstructural heterogeneities, at which even moderate stresses can result in sufficiently high temperatures to initiate material decomposition and chemical energy release. Mesoscale modeling of these ``hot spots'' requires a reaction rate model that describes the energy release with a sub-microsecond resolution and under a wide range of temperatures. No such model is available even for well-studied energetic materials such as RDX. In this presentation, I will describe an ongoing effort to develop a reaction rate model of condensed-phase RDX decomposition under high temperatures using first-principles molecular dynamics, transition-state theory, and reaction network analysis. This work was supported by the Naval Research Laboratory, by the Office of Naval Research, and by the DoD High Performance Computing Modernization Program Software Application Institute for Multiscale Reactive Modeling of Insensitive Munitions.
Mediaprocessors in medical imaging for high performance and flexibility
NASA Astrophysics Data System (ADS)
Managuli, Ravi; Kim, Yongmin
2002-05-01
New high performance programmable processors, called mediaprocessors, have been emerging since the early 1990s for various digital media applications, such as digital TV, set-top boxes, desktop video conferencing, and digital camcorders. Modern mediaprocessors, e.g., TI's TMS320C64x and Hitachi/Equator Technologies MAP-CA, can offer high performance utilizing both instruction-level and data-level parallelism. During this decade, with continued performance improvement and cost reduction, we believe that the mediaprocessors will become a preferred choice in designing imaging and video systems due to their flexibility in incorporating new algorithms and applications via programming and faster-time-to-market. In this paper, we will evaluate the suitability of these mediaprocessors in medical imaging. We will review the core routines of several medical imaging modalities, such as ultrasound and DR, and present how these routines can be mapped to mediaprocessors and their resultant performance. We will analyze the architecture of several leading mediaprocessors. By carefully mapping key imaging routines, such as 2D convolution, unsharp masking, and 2D FFT, to the mediaprocessor, we have been able to achieve comparable (if not better) performance to that of traditional hardwired approaches. Thus, we believe that future medical imaging systems will benefit greatly from these advanced mediaprocessors, offering significantly increased flexibility and adaptability, reducing the time-to-market, and improving the cost/performance ratio compared to the existing systems while meeting the high computing requirements.
NASA Technical Reports Server (NTRS)
Liew, K. H.; Urip, E.; Yang, S. L.; Siow, Y. K.; Marek, C. J.
2005-01-01
Today s modern aircraft is based on air-breathing jet propulsion systems, which use moving fluids as substances to transform energy carried by the fluids into power. Throughout aero-vehicle evolution, improvements have been made to the engine efficiency and pollutants reduction. The major advantages associated with the addition of ITB are an increase in thermal efficiency and reduction in NOx emission. Lower temperature peak in the main combustor results in lower thermal NOx emission and lower amount of cooling air required. This study focuses on a parametric (on-design) cycle analysis of a dual-spool, separate-flow turbofan engine with an Interstage Turbine Burner (ITB). The ITB considered in this paper is a relatively new concept in modern jet engine propulsion. The ITB serves as a secondary combustor and is located between the high- and the low-pressure turbine, i.e., the transition duct. The objective of this study is to use design parameters, such as flight Mach number, compressor pressure ratio, fan pressure ratio, fan bypass ratio, and high-pressure turbine inlet temperature to obtain engine performance parameters, such as specific thrust and thrust specific fuel consumption. Results of this study can provide guidance in identifying the performance characteristics of various engine components, which can then be used to develop, analyze, integrate, and optimize the system performance of turbofan engines with an ITB. Visual Basic program, Microsoft Excel macrocode, and Microsoft Excel neuron code are used to facilitate Microsoft Excel software to plot engine performance versus engine design parameters. This program computes and plots the data sequentially without forcing users to open other types of plotting programs. A user s manual on how to use the program is also included in this report. Furthermore, this stand-alone program is written in conjunction with an off-design program which is an extension of this study. The computed result of a selected design-point engine will be exported to an engine reference data file that is required in off-design calculation.
Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Panyala, Ajay R.; Halappanavar, Mahantesh
Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) {more » on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.« less
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
Performance Studies on Distributed Virtual Screening
Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.
2014-01-01
Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)
2000-01-01
HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
High-Performance Computing Data Center | Energy Systems Integration
Facility | NREL High-Performance Computing Data Center High-Performance Computing Data Center The Energy Systems Integration Facility's High-Performance Computing Data Center is home to Peregrine -the largest high-performance computing system in the world exclusively dedicated to advancing
Wagner, Roland; Helin, Tapio; Obereder, Andreas; Ramlau, Ronny
2016-02-20
The imaging quality of modern ground-based telescopes such as the planned European Extremely Large Telescope is affected by atmospheric turbulence. In consequence, they heavily depend on stable and high-performance adaptive optics (AO) systems. Using measurements of incoming light from guide stars, an AO system compensates for the effects of turbulence by adjusting so-called deformable mirror(s) (DMs) in real time. In this paper, we introduce a novel reconstruction method for ground layer adaptive optics. In the literature, a common approach to this problem is to use Bayesian inference in order to model the specific noise structure appearing due to spot elongation. This approach leads to large coupled systems with high computational effort. Recently, fast solvers of linear order, i.e., with computational complexity O(n), where n is the number of DM actuators, have emerged. However, the quality of such methods typically degrades in low flux conditions. Our key contribution is to achieve the high quality of the standard Bayesian approach while at the same time maintaining the linear order speed of the recent solvers. Our method is based on performing a separate preprocessing step before applying the cumulative reconstructor (CuReD). The efficiency and performance of the new reconstructor are demonstrated using the OCTOPUS, the official end-to-end simulation environment of the ESO for extremely large telescopes. For more specific simulations we also use the MOST toolbox.
Mechanical Modeling and Computer Simulation of Protein Folding
ERIC Educational Resources Information Center
Prigozhin, Maxim B.; Scott, Gregory E.; Denos, Sharlene
2014-01-01
In this activity, science education and modern technology are bridged to teach students at the high school and undergraduate levels about protein folding and to strengthen their model building skills. Students are guided from a textbook picture of a protein as a rigid crystal structure to a more realistic view: proteins are highly dynamic…
First Encounters of the Close Kind: The Formation Process of Airline Flight Crews
1987-01-01
process and aircrew performance, Foushee notes an interesting etymological parallel: "Webster’s New Collegiate Dictionary (1961) defines cockpit as ’a...here combines applications from the physical science of chemistry and the modern science of computers. In chemistry , a shell is a space occupied by
Bridging the Gap between Basic and Clinical Sciences: A Description of a Radiological Anatomy Course
ERIC Educational Resources Information Center
Torres, Anna; Staskiewicz, Grzegorz J.; Lisiecka, Justyna; Pietrzyk, Lukasz; Czekajlo, Michael; Arancibia, Carlos U.; Maciejewski, Ryszard; Torres, Kamil
2016-01-01
A wide variety of medical imaging techniques pervade modern medicine, and the changing portability and performance of tools like ultrasound imaging have brought these medical imaging techniques into the everyday practice of many specialties outside of radiology. However, proper interpretation of ultrasonographic and computed tomographic images…
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Evaluating Application Resilience with XRay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Sui; Bronevetsky, Greg; Li, Bin
2015-05-07
The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resiilent to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithmspecific error detection andmore » tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults and (ii) the appropriate resilience technique is applied to each code region. This paper presents XRay, a tool to view the application vulnerability to soft errors, and illustrates how XRay can be used in the context of a representative application. In addition to providing actionable insights into application behavior XRay automatically selects the number of fault injection experiments required to provide an informative view of application behavior, ensuring that the information is statistically well-grounded without performing unnecessary experiments.« less
Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias
2015-01-01
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
High Performance Thin Layer Chromatography.
ERIC Educational Resources Information Center
Costanzo, Samuel J.
1984-01-01
Clarifies where in the scheme of modern chromatography high performance thin layer chromatography (TLC) fits and why in some situations it is a viable alternative to gas and high performance liquid chromatography. New TLC plates, sample applications, plate development, and instrumental techniques are considered. (JN)
Ocular Tolerance of Contemporary Electronic Display Devices.
Clark, Andrew J; Yang, Paul; Khaderi, Khizer R; Moshfeghi, Andrew A
2018-05-01
Electronic displays have become an integral part of life in the developed world since the revolution of mobile computing a decade ago. With the release of multiple consumer-grade virtual reality (VR) and augmented reality (AR) products in the past 2 years utilizing head-mounted displays (HMDs), as well as the development of low-cost, smartphone-based HMDs, the ability to intimately interact with electronic screens is greater than ever. VR/AR HMDs also place the display at much closer ocular proximity than traditional electronic devices while also isolating the user from the ambient environment to create a "closed" system between the user's eyes and the display. Whether the increased interaction with these devices places the user's retina at higher risk of damage is currently unclear. Herein, the authors review the discovery of photochemical damage of the retina from visible light as well as summarize relevant clinical and preclinical data regarding the influence of modern display devices on retinal health. Multiple preclinical studies have been performed with modern light-emitting diode technology demonstrating damage to the retina at modest exposure levels, particularly from blue-light wavelengths. Unfortunately, high-quality in-human studies are lacking, and the small clinical investigations performed to date have failed to keep pace with the rapid evolutions in display technology. Clinical investigations assessing the effect of HMDs on human retinal function are also yet to be performed. From the available data, modern consumer electronic displays do not appear to pose any acute risk to vision with average use; however, future studies with well-defined clinical outcomes and illuminance metrics are needed to better understand the long-term risks of cumulative exposure to electronic displays in general and with "closed" VR/AR HMDs in particular. [Ophthalmic Surg Lasers Imaging Retina. 2018;49:346-354.]. Copyright 2018, SLACK Incorporated.
The role of modern diagnostic imaging in diagnosing and differentiating kidney diseases in children.
Maliborski, Artur; Zegadło, Arkadiusz; Placzyńska, Małgorzata; Sopińska, Małgorzata; Lichosik, Marianna; Jobs, Katarzyna
2018-01-01
Urinary tract diseases are in the group of the most commonly diagnosed medical conditions in pediatric patients. Many diseases with different etiologies are accompanied by pain, fever, hematuria, or urinary tract dysfunction. Those most common ones in children are urinary tract infections and congenital malformation. They can also represent tumors or changes caused by systemic diseases. Clinical tests and even more often additional imaging studies are required to make a proper diagnosis of urinary tract diseases. Just a few decades ago urography, cystography or voiding cystourethrography were the main methods in diagnostic imaging of the urinary tract. Today's imaging methods supported by digital radiographic and fluoroscopy systems, high sensitivity detectors with quantum detection, advanced algorithms eliminating motion artifacts, modern medical imaging monitors with a resolution of three or even eight megapixels significantly differ from conventional radiographic methods. The methods that are currently usually performed are: computed tomography, magnetic resonance imaging, isotopic methods and ultrasonography using elastography and new solutions in Doppler imaging. Modern techniques are currently focused on reducing radiation exposure with better imaging capabilities. The development of these techniques became an essential diagnostic aid in nephrological and urological practice. The aim of this paper is to present the latest solutions that are currently used in the diagnostic imaging of urinary tract diseases.
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuller, Ivan K.; Stevens, Rick; Pino, Robinson
2015-10-29
Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS basedmore » technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.« less
A High School Level Course On Robot Design And Construction
NASA Astrophysics Data System (ADS)
Sadler, Paul M.; Crandall, Jack L.
1984-02-01
The Robotics Design and Construction Class at Sehome High School was developed to offer gifted and/or highly motivated students an in-depth introduction to a modern engineering topic. The course includes instruction in basic electronics, digital and radio electronics, construction skills, robotics literacy, construction of the HERO 1 Heathkit Robot, computer/ robot programming, and voice synthesis. A key element which leads to the success of the course is the involvement of various community assets including manpower and financial assistance. The instructors included a physics/electronics teacher, a computer science teacher, two retired engineers, and an electronics technician.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jianhui
2015-09-01
Grid modernization is transforming the operation and management of electric distribution systems from manual, paper-driven business processes to electronic, computer-assisted decisionmaking. At the center of this business transformation is the distribution management system (DMS), which provides a foundation from which optimal levels of performance can be achieved in an increasingly complex business and operating environment. Electric distribution utilities are facing many new challenges that are dramatically increasing the complexity of operating and managing the electric distribution system: growing customer expectations for service reliability and power quality, pressure to achieve better efficiency and utilization of existing distribution system assets, and reductionmore » of greenhouse gas emissions by accommodating high penetration levels of distributed generating resources powered by renewable energy sources (wind, solar, etc.). Recent “storm of the century” events in the northeastern United States and the lengthy power outages and customer hardships that followed have greatly elevated the need to make power delivery systems more resilient to major storm events and to provide a more effective electric utility response during such regional power grid emergencies. Despite these newly emerging challenges for electric distribution system operators, only a small percentage of electric utilities have actually implemented a DMS. This paper discusses reasons why a DMS is needed and why the DMS may emerge as a mission-critical system that will soon be considered essential as electric utilities roll out their grid modernization strategies.« less
2018-01-01
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space. PMID:29346410
Extending Clause Learning of SAT Solvers with Boolean Gröbner Bases
NASA Astrophysics Data System (ADS)
Zengler, Christoph; Küchlin, Wolfgang
We extend clause learning as performed by most modern SAT Solvers by integrating the computation of Boolean Gröbner bases into the conflict learning process. Instead of learning only one clause per conflict, we compute and learn additional binary clauses from a Gröbner basis of the current conflict. We used the Gröbner basis engine of the logic package Redlog contained in the computer algebra system Reduce to extend the SAT solver MiniSAT with Gröbner basis learning. Our approach shows a significant reduction of conflicts and a reduction of restarts and computation time on many hard problems from the SAT 2009 competition.
Single-chip microprocessor that communicates directly using light
NASA Astrophysics Data System (ADS)
Sun, Chen; Wade, Mark T.; Lee, Yunsup; Orcutt, Jason S.; Alloatti, Luca; Georgas, Michael S.; Waterman, Andrew S.; Shainline, Jeffrey M.; Avizienis, Rimas R.; Lin, Sen; Moss, Benjamin R.; Kumar, Rajesh; Pavanello, Fabio; Atabaki, Amir H.; Cook, Henry M.; Ou, Albert J.; Leu, Jonathan C.; Chen, Yu-Hsin; Asanović, Krste; Ram, Rajeev J.; Popović, Miloš A.; Stojanović, Vladimir M.
2015-12-01
Data transport across short electrical wires is limited by both bandwidth and power density, which creates a performance bottleneck for semiconductor microchips in modern computer systems—from mobile phones to large-scale data centres. These limitations can be overcome by using optical communications based on chip-scale electronic-photonic systems enabled by silicon-based nanophotonic devices8. However, combining electronics and photonics on the same chip has proved challenging, owing to microchip manufacturing conflicts between electronics and photonics. Consequently, current electronic-photonic chips are limited to niche manufacturing processes and include only a few optical devices alongside simple circuits. Here we report an electronic-photonic system on a single chip integrating over 70 million transistors and 850 photonic components that work together to provide logic, memory, and interconnect functions. This system is a realization of a microprocessor that uses on-chip photonic devices to directly communicate with other chips using light. To integrate electronics and photonics at the scale of a microprocessor chip, we adopt a ‘zero-change’ approach to the integration of photonics. Instead of developing a custom process to enable the fabrication of photonics, which would complicate or eliminate the possibility of integration with state-of-the-art transistors at large scale and at high yield, we design optical devices using a standard microelectronics foundry process that is used for modern microprocessors. This demonstration could represent the beginning of an era of chip-scale electronic-photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.
Single-chip microprocessor that communicates directly using light.
Sun, Chen; Wade, Mark T; Lee, Yunsup; Orcutt, Jason S; Alloatti, Luca; Georgas, Michael S; Waterman, Andrew S; Shainline, Jeffrey M; Avizienis, Rimas R; Lin, Sen; Moss, Benjamin R; Kumar, Rajesh; Pavanello, Fabio; Atabaki, Amir H; Cook, Henry M; Ou, Albert J; Leu, Jonathan C; Chen, Yu-Hsin; Asanović, Krste; Ram, Rajeev J; Popović, Miloš A; Stojanović, Vladimir M
2015-12-24
Data transport across short electrical wires is limited by both bandwidth and power density, which creates a performance bottleneck for semiconductor microchips in modern computer systems--from mobile phones to large-scale data centres. These limitations can be overcome by using optical communications based on chip-scale electronic-photonic systems enabled by silicon-based nanophotonic devices. However, combining electronics and photonics on the same chip has proved challenging, owing to microchip manufacturing conflicts between electronics and photonics. Consequently, current electronic-photonic chips are limited to niche manufacturing processes and include only a few optical devices alongside simple circuits. Here we report an electronic-photonic system on a single chip integrating over 70 million transistors and 850 photonic components that work together to provide logic, memory, and interconnect functions. This system is a realization of a microprocessor that uses on-chip photonic devices to directly communicate with other chips using light. To integrate electronics and photonics at the scale of a microprocessor chip, we adopt a 'zero-change' approach to the integration of photonics. Instead of developing a custom process to enable the fabrication of photonics, which would complicate or eliminate the possibility of integration with state-of-the-art transistors at large scale and at high yield, we design optical devices using a standard microelectronics foundry process that is used for modern microprocessors. This demonstration could represent the beginning of an era of chip-scale electronic-photonic systems with the potential to transform computing system architectures, enabling more powerful computers, from network infrastructure to data centres and supercomputers.
Hadoop for High-Performance Climate Analytics: Use Cases and Lessons Learned
NASA Technical Reports Server (NTRS)
Tamkin, Glenn
2013-01-01
Scientific data services are a critical aspect of the NASA Center for Climate Simulations mission (NCCS). Hadoop, via MapReduce, provides an approach to high-performance analytics that is proving to be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. The NCCS is particularly interested in the potential of Hadoop to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we prototyped a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. The initial focus was on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. After preliminary results suggested that this approach improves efficiencies within data intensive analytic workflows, we invested in building a cyber infrastructure resource for developing a new generation of climate data analysis capabilities using Hadoop. This resource is focused on reducing the time spent in the preparation of reanalysis data used in data-model inter-comparison, a long sought goal of the climate community. This paper summarizes the related use cases and lessons learned.
High-Performance Computing Systems and Operations | Computational Science |
NREL Systems and Operations High-Performance Computing Systems and Operations NREL operates high-performance computing (HPC) systems dedicated to advancing energy efficiency and renewable energy technologies. Capabilities NREL's HPC capabilities include: High-Performance Computing Systems We operate
The International Symposium on Grids and Clouds
NASA Astrophysics Data System (ADS)
The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.
Astronomy and Computing: A new journal for the astronomical computing community
NASA Astrophysics Data System (ADS)
Accomazzi, Alberto; Budavári, Tamás; Fluke, Christopher; Gray, Norman; Mann, Robert G.; O'Mullane, William; Wicenec, Andreas; Wise, Michael
2013-02-01
We introduce Astronomy and Computing, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In this inaugural editorial, we describe the rationale for creating the journal, outline its scope and ambitions, and seek input from the community in defining in detail how the journal should work towards its high-level goals.
Improvements to the adaptive maneuvering logic program
NASA Technical Reports Server (NTRS)
Burgin, George H.
1986-01-01
The Adaptive Maneuvering Logic (AML) computer program simulates close-in, one-on-one air-to-air combat between two fighter aircraft. Three important improvements are described. First, the previously available versions of AML were examined for their suitability as a baseline program. The selected program was then revised to eliminate some programming bugs which were uncovered over the years. A listing of this baseline program is included. Second, the equations governing the motion of the aircraft were completely revised. This resulted in a model with substantially higher fidelity than the original equations of motion provided. It also completely eliminated the over-the-top problem, which occurred in the older versions when the AML-driven aircraft attempted a vertical or near vertical loop. Third, the requirements for a versatile generic, yet realistic, aircraft model were studied and implemented in the program. The report contains detailed tables which make the generic aircraft to be either a modern, high performance aircraft, an older high performance aircraft, or a previous generation jet fighter.
Very high frame rate volumetric integration of depth images on mobile devices.
Kähler, Olaf; Adrian Prisacariu, Victor; Yuheng Ren, Carl; Sun, Xin; Torr, Philip; Murray, David
2015-11-01
Volumetric methods provide efficient, flexible and simple ways of integrating multiple depth images into a full 3D model. They provide dense and photorealistic 3D reconstructions, and parallelised implementations on GPUs achieve real-time performance on modern graphics hardware. To run such methods on mobile devices, providing users with freedom of movement and instantaneous reconstruction feedback, remains challenging however. In this paper we present a range of modifications to existing volumetric integration methods based on voxel block hashing, considerably improving their performance and making them applicable to tablet computer applications. We present (i) optimisations for the basic data structure, and its allocation and integration; (ii) a highly optimised raycasting pipeline; and (iii) extensions to the camera tracker to incorporate IMU data. In total, our system thus achieves frame rates up 47 Hz on a Nvidia Shield Tablet and 910 Hz on a Nvidia GTX Titan XGPU, or even beyond 1.1 kHz without visualisation.
High Resolution Near Real Time Image Processing and Support for MSSS Modernization
2012-09-01
00-00-2012 to 00-00-2012 4 . TITLE AND SUBTITLE High Resolution Near Real Time Image Processing and Support for MSSS Modernization 5a. CONTRACT...This current CONOPS is depicted in Fig. 4 . Fig. 4 . PCID/ASPIRE High Resolution Post...experiments were performed, and subsequently addressed in papers and presentations [3, 4 ,] that demonstrated system behavior; with details of the
From Geocentrism to Allocentrism: Teaching the Phases of the Moon in a Digital Full-Dome Planetarium
ERIC Educational Resources Information Center
Chastenay, Pierre
2016-01-01
An increasing number of planetariums worldwide are turning digital, using ultra-fast computers, powerful graphic cards, and high-resolution video projectors to create highly realistic astronomical imagery in real time. This modern technology makes it so that the audience can observe astronomical phenomena from a geocentric as well as an…
NASA Technical Reports Server (NTRS)
Hathaway, M. D.; Wood, J. R.; Wasserbauer, C. A.
1991-01-01
A low speed centrifugal compressor facility recently built by the NASA Lewis Research Center is described. The purpose of this facility is to obtain detailed flow field measurements for computational fluid dynamic code assessment and flow physics modeling in support of Army and NASA efforts to advance small gas turbine engine technology. The facility is heavily instrumented with pressure and temperature probes, both in the stationary and rotating frames of reference, and has provisions for flow visualization and laser velocimetry. The facility will accommodate rotational speeds to 2400 rpm and is rated at pressures to 1.25 atm. The initial compressor stage being tested is geometrically and dynamically representative of modern high-performance centrifugal compressor stages with the exception of Mach number levels. Preliminary experimental investigations of inlet and exit flow uniformly and measurement repeatability are presented. These results demonstrate the high quality of the data which may be expected from this facility. The significance of synergism between computational fluid dynamic analysis and experimentation throughout the development of the low speed centrifugal compressor facility is demonstrated.
Systems-on-chip approach for real-time simulation of wheel-rail contact laws
NASA Astrophysics Data System (ADS)
Mei, T. X.; Zhou, Y. J.
2013-04-01
This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
Robust optical flow using adaptive Lorentzian filter for image reconstruction under noisy condition
NASA Astrophysics Data System (ADS)
Kesrarat, Darun; Patanavijit, Vorapoj
2017-02-01
In optical flow for motion allocation, the efficient result in Motion Vector (MV) is an important issue. Several noisy conditions may cause the unreliable result in optical flow algorithms. We discover that many classical optical flows algorithms perform better result under noisy condition when combined with modern optimized model. This paper introduces effective robust models of optical flow by using Robust high reliability spatial based optical flow algorithms using the adaptive Lorentzian norm influence function in computation on simple spatial temporal optical flows algorithm. Experiment on our proposed models confirm better noise tolerance in optical flow's MV under noisy condition when they are applied over simple spatial temporal optical flow algorithms as a filtering model in simple frame-to-frame correlation technique. We illustrate the performance of our models by performing an experiment on several typical sequences with differences in movement speed of foreground and background where the experiment sequences are contaminated by the additive white Gaussian noise (AWGN) at different noise decibels (dB). This paper shows very high effectiveness of noise tolerance models that they are indicated by peak signal to noise ratio (PSNR).
TTEthernet for Integrated Spacecraft Networks
NASA Technical Reports Server (NTRS)
Loveless, Andrew
2015-01-01
Aerospace projects have traditionally employed federated avionics architectures, in which each computer system is designed to perform one specific function (e.g. navigation). There are obvious downsides to this approach, including excessive weight (from so much computing hardware), and inefficient processor utilization (since modern processors are capable of performing multiple tasks). There has therefore been a push for integrated modular avionics (IMA), in which common computing platforms can be leveraged for different purposes. This consolidation of multiple vehicle functions to shared computing platforms can significantly reduce spacecraft cost, weight, and design complexity. However, the application of IMA principles introduces significant challenges, as the data network must accommodate traffic of mixed criticality and performance levels - potentially all related to the same shared computer hardware. Because individual network technologies are rarely so competent, the development of truly integrated network architectures often proves unreasonable. Several different types of networks are utilized - each suited to support a specific vehicle function. Critical functions are typically driven by precise timing loops, requiring networks with strict guarantees regarding message latency (i.e. determinism) and fault-tolerance. Alternatively, non-critical systems generally employ data networks prioritizing flexibility and high performance over reliable operation. Switched Ethernet has seen widespread success filling this role in terrestrial applications. Its high speed, flexibility, and the availability of inexpensive commercial off-the-shelf (COTS) components make it desirable for inclusion in spacecraft platforms. Basic Ethernet configurations have been incorporated into several preexisting aerospace projects, including both the Space Shuttle and International Space Station (ISS). However, classical switched Ethernet cannot provide the high level of network determinism required by real-time spacecraft applications. Even with modern advancements, the uncoordinated (i.e. event-driven) nature of Ethernet communication unavoidably leads to message contention within network switches. The arbitration process used to resolve such conflicts introduces variation in the time it takes for messages to be forwarded. TTEthernet1 introduces decentralized clock synchronization to switched Ethernet, enabling message transmission according to a time-triggered (TT) paradigm. A network planning tool is used to allocate each device a finite amount of time in which it may transmit a frame. Each time slot is repeated sequentially to form a periodic communication schedule that is then loaded onto each TTEthernet device (e.g. switches and end systems). Each network participant references the synchronized time in order to dispatch messages at predetermined instances. This schedule guarantees that no contention exists between time-triggered Ethernet frames in the network switches, therefore eliminating the need for arbitration (and the timing variation it causes). Besides time-triggered messaging, TTEthernet networks may provide two additional traffic classes to support communication of different criticality levels. In the rate-constrained (RC) traffic class, the frame payload size and rate of transmission along each communication channel are limited to predetermined maximums. The network switches can therefore be configured to accommodate the known worst-case traffic pattern, and buffer overflows can be eliminated. The best-effort (BE) traffic class behaves akin to classical Ethernet. No guarantees are provided regarding transmission latency or successful message delivery. TTEthernet coordinates transmission of all three traffic classes over the same physical connections, therefore accommodating the full spectrum of traffic criticality levels required in IMA architectures. Common computing platforms (e.g. LRUs) can share networking resources in such a way that failures in non-critical systems (using BE or RC communication modes) cannot impact flight-critical functions (using TT communication). Furthermore, TTEthernet hardware (e.g. switches, cabling) can be shared by both TTEthernet and classical Ethernet traffic.
NASA Technical Reports Server (NTRS)
Cohen, Jarrett
1999-01-01
Parallel computers built out of mass-market parts are cost-effectively performing data processing and simulation tasks. The Supercomputing (now known as "SC") series of conferences celebrated its 10th anniversary last November. While vendors have come and gone, the dominant paradigm for tackling big problems still is a shared-resource, commercial supercomputer. Growing numbers of users needing a cheaper or dedicated-access alternative are building their own supercomputers out of mass-market parts. Such machines are generally called Beowulf-class systems after the 11th century epic. This modern-day Beowulf story began in 1994 at NASA's Goddard Space Flight Center. A laboratory for the Earth and space sciences, computing managers there threw down a gauntlet to develop a $50,000 gigaFLOPS workstation for processing satellite data sets. Soon, Thomas Sterling and Don Becker were working on the Beowulf concept at the University Space Research Association (USRA)-run Center of Excellence in Space Data and Information Sciences (CESDIS). Beowulf clusters mix three primary ingredients: commodity personal computers or workstations, low-cost Ethernet networks, and the open-source Linux operating system. One of the larger Beowulfs is Goddard's Highly-parallel Integrated Virtual Environment, or HIVE for short.
Renkawitz, Tobias; Tingart, Markus; Grifka, Joachim; Sendtner, Ernst; Kalteis, Thomas
2009-09-01
This article outlines the scientific basis and a state-of-the-art application of computer-assisted orthopedic surgery in total hip arthroplasty (THA) and provides a future perspective on this technology. Computer-assisted orthopedic surgery in primary THA has the potential to couple 3D simulations with real-time evaluations of surgical performance, which has brought these developments from the research laboratory all the way to clinical use. Nonimage- or imageless-based navigation systems without the need for additional pre- or intra-operative image acquisition have stood the test to significantly reduce the variability in positioning the acetabular component and have shown precise measurement of leg length and offset changes during THA. More recently, computer-assisted orthopedic surgery systems have opened a new frontier for accurate surgical practice in minimally invasive, tissue-preserving THA. The future generation of imageless navigation systems will switch from simple measurement tasks to real navigation tools. These software algorithms will consider the cup and stem as components of a coupled biomechanical system, navigating the orthopedic surgeon to find an optimized complementary component orientation rather than target values intraoperatively, and are expected to have a high impact on clinical practice and postoperative functionality in modern THA.
NASA Astrophysics Data System (ADS)
Kuznetsov, D. N.; Syryamkin, V. I.
2015-11-01
Modern technologies play a very important role in our lives. It is hard to imagine how people can get along without personal computers, and companies - without powerful computer centers. Nowadays, many devices make modern medicine more effective. Medicine is developing constantly, so introduction of robots in this sector is a very promising activity. Advances in technology have influenced medicine greatly. Robotic surgery is now actively developing worldwide. Scientists have been carrying out research and practical attempts to create robotic surgeons for more than 20 years, since the mid-80s of the last century. Robotic assistants play an important role in modern medicine. This industry is new enough and is at the early stage of development; despite this, some developments already have worldwide application; they function successfully and bring invaluable help to employees of medical institutions. Today, doctors can perform operations that seemed impossible a few years ago. Such progress in medicine is due to many factors. First, modern operating rooms are equipped with up-to-date equipment, allowing doctors to make operations more accurately and with less risk to the patient. Second, technology has enabled to improve the quality of doctors' training. Various types of robots exist now: assistants, military robots, space, household and medical, of course. Further, we should make a detailed analysis of existing types of robots and their application. The purpose of the article is to illustrate the most popular types of robots used in medicine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grebennikov, A.N.; Zhitnik, A.K.; Zvenigorodskaya, O.A.
1995-12-31
In conformity with the protocol of the Workshop under Contract {open_quotes}Assessment of RBMK reactor safety using modern Western Codes{close_quotes} VNIIEF performed a neutronics computation series to compare western and VNIIEF codes and assess whether VNIIEF codes are suitable for RBMK type reactor safety assessment computation. The work was carried out in close collaboration with M.I. Rozhdestvensky and L.M. Podlazov, NIKIET employees. The effort involved: (1) cell computations with the WIMS, EKRAN codes (improved modification of the LOMA code) and the S-90 code (VNIIEF Monte Carlo). Cell, polycell, burnup computation; (2) 3D computation of static states with the KORAT-3D and NEUmore » codes and comparison with results of computation with the NESTLE code (USA). The computations were performed in the geometry and using the neutron constants presented by the American party; (3) 3D computation of neutron kinetics with the KORAT-3D and NEU codes. These computations were performed in two formulations, both being developed in collaboration with NIKIET. Formulation of the first problem maximally possibly agrees with one of NESTLE problems and imitates gas bubble travel through a core. The second problem is a model of the RBMK as a whole with imitation of control and protection system controls (CPS) movement in a core.« less
AFC-Enabled Simplified High-Lift System Integration Study
NASA Technical Reports Server (NTRS)
Hartwich, Peter M.; Dickey, Eric D.; Sclafani, Anthony J.; Camacho, Peter; Gonzales, Antonio B.; Lawson, Edward L.; Mairs, Ron Y.; Shmilovich, Arvin
2014-01-01
The primary objective of this trade study report is to explore the potential of using Active Flow Control (AFC) for achieving lighter and mechanically simpler high-lift systems for transonic commercial transport aircraft. This assessment was conducted in four steps. First, based on the Common Research Model (CRM) outer mold line (OML) definition, two high-lift concepts were developed. One concept, representative of current production-type commercial transonic transports, features leading edge slats and slotted trailing edge flaps with Fowler motion. The other CRM-based design relies on drooped leading edges and simply hinged trailing edge flaps for high-lift generation. The relative high-lift performance of these two high-lift CRM variants is established using Computational Fluid Dynamics (CFD) solutions to the Reynolds-Averaged Navier-Stokes (RANS) equations for steady flow. These CFD assessments identify the high-lift performance that needs to be recovered through AFC to have the CRM variant with the lighter and mechanically simpler high-lift system match the performance of the conventional high-lift system. Conceptual design integration studies for the AFC-enhanced high-lift systems were conducted with a NASA Environmentally Responsible Aircraft (ERA) reference configuration, the so-called ERA-0003 concept. These design trades identify AFC performance targets that need to be met to produce economically feasible ERA-0003-like concepts with lighter and mechanically simpler high-lift designs that match the performance of conventional high-lift systems. Finally, technical challenges are identified associated with the application of AFC-enabled highlift systems to modern transonic commercial transports for future technology maturation efforts.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0
NASA Technical Reports Server (NTRS)
Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine
2004-01-01
We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Modern energy efficient technologies of high-rise construction
NASA Astrophysics Data System (ADS)
Lukmanova, Inessa; Golov, Roman
2018-03-01
The paper analyzes modern energy-efficient technologies, both being applied, and only introduced into the application in the construction of high-rise residential buildings. All technologies are systematized by the authors as part of a unified model of "Arrows of Energy-Efficient Technologies", which imply performing energy-saving measures in the design, construction and operation of buildings.
Compilation of Abstracts for SC12 Conference Proceedings
NASA Technical Reports Server (NTRS)
Morello, Gina Francine (Compiler)
2012-01-01
1 A Breakthrough in Rotorcraft Prediction Accuracy Using Detached Eddy Simulation; 2 Adjoint-Based Design for Complex Aerospace Configurations; 3 Simulating Hypersonic Turbulent Combustion for Future Aircraft; 4 From a Roar to a Whisper: Making Modern Aircraft Quieter; 5 Modeling of Extended Formation Flight on High-Performance Computers; 6 Supersonic Retropropulsion for Mars Entry; 7 Validating Water Spray Simulation Models for the SLS Launch Environment; 8 Simulating Moving Valves for Space Launch System Liquid Engines; 9 Innovative Simulations for Modeling the SLS Solid Rocket Booster Ignition; 10 Solid Rocket Booster Ignition Overpressure Simulations for the Space Launch System; 11 CFD Simulations to Support the Next Generation of Launch Pads; 12 Modeling and Simulation Support for NASA's Next-Generation Space Launch System; 13 Simulating Planetary Entry Environments for Space Exploration Vehicles; 14 NASA Center for Climate Simulation Highlights; 15 Ultrascale Climate Data Visualization and Analysis; 16 NASA Climate Simulations and Observations for the IPCC and Beyond; 17 Next-Generation Climate Data Services: MERRA Analytics; 18 Recent Advances in High-Resolution Global Atmospheric Modeling; 19 Causes and Consequences of Turbulence in the Earths Protective Shield; 20 NASA Earth Exchange (NEX): A Collaborative Supercomputing Platform; 21 Powering Deep Space Missions: Thermoelectric Properties of Complex Materials; 22 Meeting NASA's High-End Computing Goals Through Innovation; 23 Continuous Enhancements to the Pleiades Supercomputer for Maximum Uptime; 24 Live Demonstrations of 100-Gbps File Transfers Across LANs and WANs; 25 Untangling the Computing Landscape for Climate Simulations; 26 Simulating Galaxies and the Universe; 27 The Mysterious Origin of Stellar Masses; 28 Hot-Plasma Geysers on the Sun; 29 Turbulent Life of Kepler Stars; 30 Modeling Weather on the Sun; 31 Weather on Mars: The Meteorology of Gale Crater; 32 Enhancing Performance of NASAs High-End Computing Applications; 33 Designing Curiosity's Perfect Landing on Mars; 34 The Search Continues: Kepler's Quest for Habitable Earth-Sized Planets.
High-performance finite-difference time-domain simulations of C-Mod and ITER RF antennas
NASA Astrophysics Data System (ADS)
Jenkins, Thomas G.; Smithe, David N.
2015-12-01
Finite-difference time-domain methods have, in recent years, developed powerful capabilities for modeling realistic ICRF behavior in fusion plasmas [1, 2, 3, 4]. When coupled with the power of modern high-performance computing platforms, such techniques allow the behavior of antenna near and far fields, and the flow of RF power, to be studied in realistic experimental scenarios at previously inaccessible levels of resolution. In this talk, we present results and 3D animations from high-performance FDTD simulations on the Titan Cray XK7 supercomputer, modeling both Alcator C-Mod's field-aligned ICRF antenna and the ITER antenna module. Much of this work focuses on scans over edge density, and tailored edge density profiles, to study dispersion and the physics of slow wave excitation in the immediate vicinity of the antenna hardware and SOL. An understanding of the role of the lower-hybrid resonance in low-density scenarios is emerging, and possible implications of this for the NSTX launcher and power balance are also discussed. In addition, we discuss ongoing work centered on using these simulations to estimate sputtering and impurity production, as driven by the self-consistent sheath potentials at antenna surfaces.
High-performance finite-difference time-domain simulations of C-Mod and ITER RF antennas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jenkins, Thomas G., E-mail: tgjenkins@txcorp.com; Smithe, David N., E-mail: smithe@txcorp.com
Finite-difference time-domain methods have, in recent years, developed powerful capabilities for modeling realistic ICRF behavior in fusion plasmas [1, 2, 3, 4]. When coupled with the power of modern high-performance computing platforms, such techniques allow the behavior of antenna near and far fields, and the flow of RF power, to be studied in realistic experimental scenarios at previously inaccessible levels of resolution. In this talk, we present results and 3D animations from high-performance FDTD simulations on the Titan Cray XK7 supercomputer, modeling both Alcator C-Mod’s field-aligned ICRF antenna and the ITER antenna module. Much of this work focuses on scansmore » over edge density, and tailored edge density profiles, to study dispersion and the physics of slow wave excitation in the immediate vicinity of the antenna hardware and SOL. An understanding of the role of the lower-hybrid resonance in low-density scenarios is emerging, and possible implications of this for the NSTX launcher and power balance are also discussed. In addition, we discuss ongoing work centered on using these simulations to estimate sputtering and impurity production, as driven by the self-consistent sheath potentials at antenna surfaces.« less
An introduction to real-time graphical techniques for analyzing multivariate data
NASA Astrophysics Data System (ADS)
Friedman, Jerome H.; McDonald, John Alan; Stuetzle, Werner
1987-08-01
Orion I is a graphics system used to study applications of computer graphics - especially interactive motion graphics - in statistics. Orion I is the newest of a family of "Prim" systems, whose most striking common feature is the use of real-time motion graphics to display three dimensional scatterplots. Orion I differs from earlier Prim systems through the use of modern and relatively inexpensive raster graphics and microprocessor technology. It also delivers more computing power to its user; Orion I can perform more sophisticated real-time computations than were possible on previous such systems. We demonstrate some of Orion I's capabilities in our film: "Exploring data with Orion I".
NASA Astrophysics Data System (ADS)
Heister, Timo; Dannberg, Juliane; Gassmöller, Rene; Bangerth, Wolfgang
2017-08-01
Computations have helped elucidate the dynamics of Earth's mantle for several decades already. The numerical methods that underlie these simulations have greatly evolved within this time span, and today include dynamically changing and adaptively refined meshes, sophisticated and efficient solvers, and parallelization to large clusters of computers. At the same time, many of the methods - discussed in detail in a previous paper in this series - were developed and tested primarily using model problems that lack many of the complexities that are common to the realistic models our community wants to solve today. With several years of experience solving complex and realistic models, we here revisit some of the algorithm designs of the earlier paper and discuss the incorporation of more complex physics. In particular, we re-consider time stepping and mesh refinement algorithms, evaluate approaches to incorporate compressibility, and discuss dealing with strongly varying material coefficients, latent heat, and how to track chemical compositions and heterogeneities. Taken together and implemented in a high-performance, massively parallel code, the techniques discussed in this paper then allow for high resolution, 3-D, compressible, global mantle convection simulations with phase transitions, strongly temperature dependent viscosity and realistic material properties based on mineral physics data.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Accelerating scientific computations with mixed precision algorithms
NASA Astrophysics Data System (ADS)
Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire
2009-12-01
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
Recreating History with Archimedes and Pi
ERIC Educational Resources Information Center
Santucci, Lora C.
2011-01-01
Using modern technology to examine classical mathematics problems at the high school level can reduce difficult computations and encourage generalizations. When teachers combine historical context with access to technology, they challenge advanced students to think deeply, spark interest in students whose primary interest is not mathematics, and…
JINR cloud infrastructure evolution
NASA Astrophysics Data System (ADS)
Baranov, A. V.; Balashov, N. A.; Kutovskiy, N. A.; Semenov, R. N.
2016-09-01
To fulfil JINR commitments in different national and international projects related to the use of modern information technologies such as cloud and grid computing as well as to provide a modern tool for JINR users for their scientific research a cloud infrastructure was deployed at Laboratory of Information Technologies of Joint Institute for Nuclear Research. OpenNebula software was chosen as a cloud platform. Initially it was set up in simple configuration with single front-end host and a few cloud nodes. Some custom development was done to tune JINR cloud installation to fit local needs: web form in the cloud web-interface for resources request, a menu item with cloud utilization statistics, user authentication via Kerberos, custom driver for OpenVZ containers. Because of high demand in that cloud service and its resources over-utilization it was re-designed to cover increasing users' needs in capacity, availability and reliability. Recently a new cloud instance has been deployed in high-availability configuration with distributed network file system and additional computing power.
Final Project Report: Data Locality Enhancement of Dynamic Simulations for Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, Xipeng
The goal of this project is to develop a set of techniques and software tools to enhance the matching between memory accesses in dynamic simulations and the prominent features of modern and future manycore systems, alleviating the memory performance issues for exascale computing. In the first three years, the PI and his group have achieves some significant progress towards the goal, producing a set of novel techniques for improving the memory performance and data locality in manycore systems, yielding 18 conference and workshop papers and 4 journal papers and graduating 6 Ph.Ds. This report summarizes the research results of thismore » project through that period.« less
NASA Technical Reports Server (NTRS)
Manford, J. S.; Bennett, G. R.
1985-01-01
The Space Station Program will incorporate analysis of operations constraints and considerations in the early design phases to avoid the need for later modifications to the Space Station for operations. The application of modern tools and administrative techniques to minimize the cost of performing effective orbital operations planning and design analysis in the preliminary design phase of the Space Station Program is discussed. Tools and techniques discussed include: approach for rigorous analysis of operations functions, use of the resources of a large computer network, and providing for efficient research and access to information.
NASA Technical Reports Server (NTRS)
Goodrich, Kenneth H.
1993-01-01
A batch air combat simulation environment, the tactical maneuvering simulator (TMS), is presented. The TMS is a tool for developing and evaluating tactical maneuvering logics, but it can also be used to evaluate the tactical implications of perturbations to aircraft performance or supporting systems. The TMS can simulate air combat between any number of engagement participants, with practical limits imposed by computer memory and processing power. Aircraft are modeled using equations of motion, control laws, aerodynamics, and propulsive characteristics equivalent to those used in high-fidelity piloted simulations. Data bases representative of a modern high-performance aircraft with and without thrust-vectoring capability are included. To simplify the task of developing and implementing maneuvering logics in the TMS, an outer-loop control system, the tactical autopilot (TA), is implemented in the aircraft simulation model. The TA converts guidance commands by computerized maneuvering logics from desired angle of attack and wind-axis bank-angle inputs to the inner loop control augmentation system of the aircraft. The capabilities and operation of the TMS and the TA are described.
High-Performance Computing User Facility | Computational Science | NREL
User Facility High-Performance Computing User Facility The High-Performance Computing User Facility technologies. Photo of the Peregrine supercomputer The High Performance Computing (HPC) User Facility provides Gyrfalcon Mass Storage System. Access Our HPC User Facility Learn more about these systems and how to access
DOE Office of Scientific and Technical Information (OSTI.GOV)
Welcome, Michael L.; Bell, Christian S.
GASNet (Global-Address Space Networking) is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages such as UPC and Titanium. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet is designed specifically to support high-performance, portable implementations of global address space languages on modern high-end communication networks. The interface provides the flexibility and extensibility required to express a wide variety of communication patterns without sacrificing performancemore » by imposing large computational overheads in the interface. The design of the GASNet interface is partitioned into two layers to maximize porting ease without sacrificing performance: the lower level is a narrow but very general interface called the GASNet core API - the design is basedheavily on Active Messages, and is implemented directly on top of each individual network architecture. The upper level is a wider and more expressive interface called GASNet extended API, which provides high-level operations such as remote memory access and various collective operations. This release implements GASNet over MPI, the Quadrics "elan" API, the Myrinet "GM" API and the "LAPI" interface to the IBM SP switch. A template is provided for adding support for additional network interfaces.« less
High Performance Computing Meets Energy Efficiency - Continuum Magazine |
NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data
Mantle Convection on Modern Supercomputers
NASA Astrophysics Data System (ADS)
Weismüller, J.; Gmeiner, B.; Huber, M.; John, L.; Mohr, M.; Rüde, U.; Wohlmuth, B.; Bunge, H. P.
2015-12-01
Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures is handled successfully only in an interdisciplinary context. A new priority program - named SPPEXA - by the German Research Foundation (DFG) addresses this issue, and brings together computer scientists, mathematicians and application scientists around grand challenges in HPC. Here we report from the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection and assess the impact of small scale processes on global mantle flow.
Numerical analysis of the Magnus moment on a spin-stabilized projectile
NASA Astrophysics Data System (ADS)
Cremins, Michael; Rodebaugh, Gregory; Verhulst, Claire; Benson, Michael; van Poppel, Bret
2016-11-01
The Magnus moment is a result of an uneven pressure distribution that occurs when an object rotates in a crossflow. Unlike the Magnus force, which is often small for spin-stabilized projectiles, the Magnus moment can have a strong detrimental effect on flight stability. According to one source, most transonic and subsonic flight instabilities are caused by the Magnus moment [Modern Exterior Ballistics, McCoy], and yet simulations often fail to accurately predict the Magnus moment in the subsonic regime. In this study, we present hybrid Reynolds Averaged Navier Stokes (RANS) and Large Eddy Simulation (LES) predictions of the Magnus moment for a spin-stabilized projectile. Velocity, pressure, and Magnus moment predictions are presented for multiple Reynolds numbers and spin rates. We also consider the effect of a sting mount, which is commonly used when conducting flow measurements in a wind tunnel or water channel. Finally, we present the initial designs for a novel Magnetic Resonance Velocimetry (MRV) experiment to measure three-dimensional flow around a spinning projectile. This work was supported by the Department of Defense High Performance Computing Modernization Program (DoD HPCMP).
Assessment of CFD capability for prediction of hypersonic shock interactions
NASA Astrophysics Data System (ADS)
Knight, Doyle; Longo, José; Drikakis, Dimitris; Gaitonde, Datta; Lani, Andrea; Nompelis, Ioannis; Reimann, Bodo; Walpot, Louis
2012-01-01
The aerothermodynamic loadings associated with shock wave boundary layer interactions (shock interactions) must be carefully considered in the design of hypersonic air vehicles. The capability of Computational Fluid Dynamics (CFD) software to accurately predict hypersonic shock wave laminar boundary layer interactions is examined. A series of independent computations performed by researchers in the US and Europe are presented for two generic configurations (double cone and cylinder) and compared with experimental data. The results illustrate the current capabilities and limitations of modern CFD methods for these flows.
New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases
NASA Astrophysics Data System (ADS)
Brescia, Massimo
2012-11-01
Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.
Accelerating DNA analysis applications on GPU clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Computational reacting gas dynamics
NASA Technical Reports Server (NTRS)
Lam, S. H.
1993-01-01
In the study of high speed flows at high altitudes, such as that encountered by re-entry spacecrafts, the interaction of chemical reactions and other non-equilibrium processes in the flow field with the gas dynamics is crucial. Generally speaking, problems of this level of complexity must resort to numerical methods for solutions, using sophisticated computational fluid dynamics (CFD) codes. The difficulties introduced by reacting gas dynamics can be classified into three distinct headings: (1) the usually inadequate knowledge of the reaction rate coefficients in the non-equilibrium reaction system; (2) the vastly larger number of unknowns involved in the computation and the expected stiffness of the equations; and (3) the interpretation of the detailed reacting CFD numerical results. The research performed accepts the premise that reacting flows of practical interest in the future will in general be too complex or 'untractable' for traditional analytical developments. The power of modern computers must be exploited. However, instead of focusing solely on the construction of numerical solutions of full-model equations, attention is also directed to the 'derivation' of the simplified model from the given full-model. In other words, the present research aims to utilize computations to do tasks which have traditionally been done by skilled theoreticians: to reduce an originally complex full-model system into an approximate but otherwise equivalent simplified model system. The tacit assumption is that once the appropriate simplified model is derived, the interpretation of the detailed numerical reacting CFD numerical results will become much easier. The approach of the research is called computational singular perturbation (CSP).
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
Can Accelerators Accelerate Learning?
NASA Astrophysics Data System (ADS)
Santos, A. C. F.; Fonseca, P.; Coelho, L. F. S.
2009-03-01
The 'Young Talented' education program developed by the Brazilian State Funding Agency (FAPERJ) [1] makes it possible for high-schools students from public high schools to perform activities in scientific laboratories. In the Atomic and Molecular Physics Laboratory at Federal University of Rio de Janeiro (UFRJ), the students are confronted with modern research tools like the 1.7 MV ion accelerator. Being a user-friendly machine, the accelerator is easily manageable by the students, who can perform simple hands-on activities, stimulating interest in physics, and getting the students close to modern laboratory techniques.
Surface hardening of cutting elements agricultural machinery vibro arc plasma
NASA Astrophysics Data System (ADS)
Sharifullin, S. N.; Adigamov, N. R.; Adigamov, N. N.; Solovev, R. Y.; Arakcheeva, K. S.
2016-01-01
At present, the state technical policy aimed at the modernization of worn equipment, including agriculture, based on the use of high-performance technology called nanotechnology. By upgrading worn-out equipment meant restoring it with the achievement of the above parameters passport. The existing traditional technologies are not suitable for the repair of worn-out equipment modernization. This is especially true of imported equipment. Out here alone - is the use of high-performance technologies. In this paper, we consider the use of vibro arc plasma for surface hardening of cutting elements of agricultural machinery.
Distributed GPU Computing in GIScience
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.
2013-12-01
Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3), 378-394. 2. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D Environmental Data Using Many-core Graphics Processing Units (GPUs) and Multi-core Central Processing Units (CPUs). Computers & Geosciences, 59(9), 78-89. 3. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879-899.
NASA Astrophysics Data System (ADS)
Gordova, Yulia; Gorbatenko, Valentina; Martynova, Yulia; Shulgina, Tamara
2014-05-01
A problem of making education relevant to the workplace tasks is a key problem of higher education because old-school training programs are not keeping pace with the rapidly changing situation in the professional field of environmental sciences. A joint group of specialists from Tomsk State University and Siberian center for Environmental research and Training/IMCES SB RAS developed several new courses for students of "Climatology" and "Meteorology" specialties, which comprises theoretical knowledge from up-to-date environmental sciences with practical tasks. To organize the educational process we use an open-source course management system Moodle (www.moodle.org). It gave us an opportunity to combine text and multimedia in a theoretical part of educational courses. The hands-on approach is realized through development of innovative trainings which are performed within the information-computational platform "Climate" (http://climate.scert.ru/) using web GIS tools. These trainings contain practical tasks on climate modeling and climate changes assessment and analysis and should be performed using typical tools which are usually used by scientists performing such kind of research. Thus, students are engaged in n the use of modern tools of the geophysical data analysis and it cultivates dynamic of their professional learning. The hands-on approach can help us to fill in this gap because it is the only approach that offers experience, increases students involvement, advance the use of modern information and communication tools. The courses are implemented at Tomsk State University and help forming modern curriculum in Earth system science area. This work is partially supported by SB RAS project VIII.80.2.1, RFBR grants numbers 13-05-12034 and 14-05-00502.
Virtual reality computer simulation.
Grantcharov, T P; Rosenberg, J; Pahle, E; Funch-Jensen, P
2001-03-01
Objective assessment of psychomotor skills should be an essential component of a modern surgical training program. There are computer systems that can be used for this purpose, but their wide application is not yet generally accepted. The aim of this study was to validate the role of virtual reality computer simulation as a method for evaluating surgical laparoscopic skills. The study included 14 surgical residents. On day 1, they performed two runs of all six tasks on the Minimally Invasive Surgical Trainer, Virtual Reality (MIST VR). On day 2, they performed a laparoscopic cholecystectomy on living pigs; afterward, they were tested again on the MIST VR. A group of experienced surgeons evaluated the trainees' performance on the animal operation, giving scores for total performance error and economy of motion. During the tasks on the MIST VR, errors and noneconomy of movements for the left and right hand were also recorded. There were significant correlations between error scores in vivo and three of the six in vitro tasks (p < 0.05). In vivo economy scores correlated significantly with non-economy right-hand scores for five of the six tasks and with non-economy left-hand scores for one of the six tasks (p < 0.05). In this study, laparoscopic performance in the animal model correlated significantly with performance on the computer simulator. Thus, the computer model seems to be a promising objective method for the assessment of laparoscopic psychomotor skills.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.
2011-07-01
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
High performance computing and communications program
NASA Technical Reports Server (NTRS)
Holcomb, Lee
1992-01-01
A review of the High Performance Computing and Communications (HPCC) program is provided in vugraph format. The goals and objectives of this federal program are as follows: extend U.S. leadership in high performance computing and computer communications; disseminate the technologies to speed innovation and to serve national goals; and spur gains in industrial competitiveness by making high performance computing integral to design and production.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems
Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.
2012-01-01
As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.
Kim, Daehyun; Trzasko, Joshua; Smelyanskiy, Mikhail; Haider, Clifton; Dubey, Pradeep; Manduca, Armando
2011-01-01
Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
A preliminary study of molecular dynamics on reconfigurable computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolinski, C.; Trouw, F. R.; Gokhale, M.
2003-01-01
In this paper we investigate the performance of platform FPGAs on a compute-intensive, floating-point-intensive supercomputing application, Molecular Dynamics (MD). MD is a popular simulation technique to track interacting particles through time by integrating their equations of motion. One part of the MD algorithm was implemented using the Fabric Generator (FG)[l I ] and mapped onto several reconfigurable logic arrays. FG is a Java-based toolset that greatly accelerates construction of the fabrics from an abstract technology independent representation. Our experiments used technology-independent IEEE 32-bit floating point operators so that the design could be easily re-targeted. Experiments were performed using both non-pipelinedmore » and pipelined floating point modules. We present results for the Altera Excalibur ARM System on a Programmable Chip (SoPC), the Altera Strath EPlS80, and the Xilinx Virtex-N Pro 2VP.50. The best results obtained were 5.69 GFlops at 8OMHz(Altera Strath EPlS80), and 4.47 GFlops at 82 MHz (Xilinx Virtex-II Pro 2VF50). Assuming a lOWpower budget, these results compare very favorably to a 4Gjlop/40Wprocessing/power rate for a modern Pentium, suggesting that reconfigurable logic can achieve high performance at low power on jloating-point-intensivea pplications.« less
NASA Technical Reports Server (NTRS)
Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spong, D.A.
The design techniques and physics analysis of modern stellarator configurations for magnetic fusion research rely heavily on high performance computing and simulation. Stellarators, which are fundamentally 3-dimensional in nature, offer significantly more design flexibility than more symmetric devices such as the tokamak. By varying the outer boundary shape of the plasma, a variety of physics features, such as transport, stability, and heating efficiency can be optimized. Scientific visualization techniques are an important adjunct to this effort as they provide a necessary ergonomic link between the numerical results and the intuition of the human researcher. The authors have developed a varietymore » of visualization techniques for stellarators which both facilitate the design optimization process and allow the physics simulations to be more readily understood.« less
NASA Astrophysics Data System (ADS)
Han, Ling; Miller, Brian W.; Barrett, Harrison H.; Barber, H. Bradford; Furenlid, Lars R.
2017-09-01
iQID is an intensified quantum imaging detector developed in the Center for Gamma-Ray Imaging (CGRI). Originally called BazookaSPECT, iQID was designed for high-resolution gamma-ray imaging and preclinical gamma-ray single-photon emission computed tomography (SPECT). With the use of a columnar scintillator, an image intensifier and modern CCD/CMOS sensors, iQID cameras features outstanding intrinsic spatial resolution. In recent years, many advances have been achieved that greatly boost the performance of iQID, broadening its applications to cover nuclear and particle imaging for preclinical, clinical and homeland security settings. This paper presents an overview of the recent advances of iQID technology and its applications in preclinical and clinical scintigraphy, preclinical SPECT, particle imaging (alpha, neutron, beta, and fission fragment), and digital autoradiography.
Bonsai: an event-based framework for processing and controlling data streams
Lopes, Gonçalo; Bonacchi, Niccolò; Frazão, João; Neto, Joana P.; Atallah, Bassam V.; Soares, Sofia; Moreira, Luís; Matias, Sara; Itskov, Pavel M.; Correia, Patrícia A.; Medina, Roberto E.; Calcaterra, Lorenza; Dreosti, Elena; Paton, Joseph J.; Kampff, Adam R.
2015-01-01
The design of modern scientific experiments requires the control and monitoring of many different data streams. However, the serial execution of programming instructions in a computer makes it a challenge to develop software that can deal with the asynchronous, parallel nature of scientific data. Here we present Bonsai, a modular, high-performance, open-source visual programming framework for the acquisition and online processing of data streams. We describe Bonsai's core principles and architecture and demonstrate how it allows for the rapid and flexible prototyping of integrated experimental designs in neuroscience. We specifically highlight some applications that require the combination of many different hardware and software components, including video tracking of behavior, electrophysiology and closed-loop control of stimulation. PMID:25904861
WE-D-303-00: Computational Phantoms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewis, John; Brigham and Women’s Hospital and Dana-Farber Cancer Institute, Boston, MA
2015-06-15
Modern medical physics deals with complex problems such as 4D radiation therapy and imaging quality optimization. Such problems involve a large number of radiological parameters, and anatomical and physiological breathing patterns. A major challenge is how to develop, test, evaluate and compare various new imaging and treatment techniques, which often involves testing over a large range of radiological parameters as well as varying patient anatomies and motions. It would be extremely challenging, if not impossible, both ethically and practically, to test every combination of parameters and every task on every type of patient under clinical conditions. Computer-based simulation using computationalmore » phantoms offers a practical technique with which to evaluate, optimize, and compare imaging technologies and methods. Within simulation, the computerized phantom provides a virtual model of the patient’s anatomy and physiology. Imaging data can be generated from it as if it was a live patient using accurate models of the physics of the imaging and treatment process. With sophisticated simulation algorithms, it is possible to perform virtual experiments entirely on the computer. By serving as virtual patients, computational phantoms hold great promise in solving some of the most complex problems in modern medical physics. In this proposed symposium, we will present the history and recent developments of computational phantom models, share experiences in their application to advanced imaging and radiation applications, and discuss their promises and limitations. Learning Objectives: Understand the need and requirements of computational phantoms in medical physics research Discuss the developments and applications of computational phantoms Know the promises and limitations of computational phantoms in solving complex problems.« less
COMBINE*: An integrated opto-mechanical tool for laser performance modeling
NASA Astrophysics Data System (ADS)
Rehak, M.; Di Nicola, J. M.
2015-02-01
Accurate modeling of thermal, mechanical and optical processes is important for achieving reliable, high-performance high energy lasers such as those at the National Ignition Facility [1] (NIF). The need for this capability is even more critical for high average power, high repetition rate applications. Modeling the effects of stresses and temperature fields on optical properties allows for optimal design of optical components and more generally of the architecture of the laser system itself. Stresses change the indices of refractions and induce inhomogeneities and anisotropy. We present a modern, integrated analysis tool that efficiently produces reliable results that are used in our laser propagation tools such as VBL [5]. COMBINE is built on and supplants the existing legacy tools developed for the previous generations of lasers at LLNL but also uses commercially available mechanical finite element codes ANSYS or COMSOL (including computational fluid dynamics). The COMBINE code computes birefringence and wave front distortions due to mechanical stresses on lenses and slabs of arbitrary geometry. The stresses calculated typically originate from mounting support, vacuum load, gravity, heat absorption and/or attending cooling. Of particular importance are the depolarization and detuning effects of nonlinear crystals due to thermal loading. Results are given in the form of Jones matrices, depolarization maps and wave front distributions. An incremental evaluation of Jones matrices and ray propagation in a 3D mesh with a stress and temperature field is performed. Wavefront and depolarization maps are available at the optical aperture and at slices within the optical element. The suite is validated, user friendly, supported, documented and amenable to collaborative development. * COMBINE stands for Code for Opto-Mechanical Birefringence Integrated Numerical Evaluations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, Lizhen; Yang, Ying; Tyburska-Puschel, Beata
The mission of the Nuclear Energy Enabling Technologies (NEET) program is to develop crosscutting technologies for nuclear energy applications. Advanced structural materials with superior performance at elevated temperatures are always desired for nuclear reactors, which can improve reactor economics, safety margins, and design flexibility. They benefit not only new reactors, including advanced light water reactors (LWRs) and fast reactors such as sodium-cooled fast reactor (SFR) that is primarily designed for management of high-level wastes, but also life extension of the existing fleet when component exchange is needed. Developing and utilizing the modern materials science tools (experimental, theoretical, and computational tools)more » is an important path to more efficient alloy development and process optimization. Ferritic-martensitic (FM) steels are important structural materials for nuclear reactors due to their advantages over other applicable materials like austenitic stainless steels, notably their resistance to void swelling, low thermal expansion coefficients, and higher thermal conductivity. However, traditional FM steels exhibit a noticeable yield strength reduction at elevated temperatures above ~500°C, which limits their applications in advanced nuclear reactors which target operating temperatures at 650°C or higher. Although oxide-dispersion-strengthened (ODS) ferritic steels have shown excellent high-temperature performance, their extremely high cost, limited size and fabricability of products, as well as the great difficulty with welding and joining, have limited or precluded their commercial applications. Zirconium has shown many benefits to Fe-base alloys such as grain refinement, improved phase stability, and reduced radiation-induced segregation. The ultimate goal of this project is, with the aid of computational modeling tools, to accelerate the development of a new generation of Zr-bearing ferritic alloys to be fabricated using conventional steelmaking practices, which have excellent radiation resistance and enhanced high-temperature creep performance greater than Grade 91.« less
1981-03-12
agriculture, raw materials, energy sources, computers, lasers , space and aeronautics, high energy physics, and genetics. The four modernizations will be...accomp- lished and the strong socialist country that is born at the end of the century will be a keyhole for the promotion of science and technology...Process (FNP). Its purpose is to connect with the Kiautsu University computer (model 108) and then to connect a data terminal . This will make a
Computational analysis of aircraft pressure relief doors
NASA Astrophysics Data System (ADS)
Schott, Tyler
Modern trends in commercial aircraft design have sought to improve fuel efficiency while reducing emissions by operating at higher pressures and temperatures than ever before. Consequently, greater demands are placed on the auxiliary bleed air systems used for a multitude of aircraft operations. The increased role of bleed air systems poses significant challenges for the pressure relief system to ensure the safe and reliable operation of the aircraft. The core compartment pressure relief door (PRD) is an essential component of the pressure relief system which functions to relieve internal pressure in the core casing of a high-bypass turbofan engine during a burst duct over-pressurization event. The successful modeling and analysis of a burst duct event are imperative to the design and development of PRD's to ensure that they will meet the increased demands placed on the pressure relief system. Leveraging high-performance computing coupled with advances in computational analysis, this thesis focuses on a comprehensive computational fluid dynamics (CFD) study to characterize turbulent flow dynamics and quantify the performance of a core compartment PRD across a range of operating conditions and geometric configurations. The CFD analysis was based on a compressible, steady-state, three-dimensional, Reynolds-averaged Navier-Stokes approach. Simulations were analyzed, and results show that variations in freestream conditions, plenum environment, and geometric configurations have a non-linear impact on the discharge, moment, thrust, and surface temperature characteristics. The CFD study revealed that the underlying physics for this behavior is explained by the interaction of vortices, jets, and shockwaves. This thesis research is innovative and provides a comprehensive and detailed analysis of existing and novel PRD geometries over a range of realistic operating conditions representative of a burst duct over-pressurization event. Further, the study provides aircraft manufacturers with valuable insight into the impact that operating conditions and geometric configurations have on PRD performance and how the information can be used to assist future research and development of PRD design.
Computational Science News | Computational Science | NREL
-Cooled High-Performance Computing Technology at the ESIF February 28, 2018 NREL Launches New Website for High-Performance Computing System Users The National Renewable Energy Laboratory (NREL) Computational Science Center has launched a revamped website for users of the lab's high-performance computing (HPC
NASA Astrophysics Data System (ADS)
Greiner, Nathan J.
Modern turbine engines require high turbine inlet temperatures and pressures to maximize thermal efficiency. Increasing the turbine inlet temperature drives higher heat loads on the turbine surfaces. In addition, increasing pressure ratio increases the turbine coolant temperature such that the ability to remove heat decreases. As a result, highly effective external film cooling is required to reduce the heat transfer to turbine surfaces. Testing of film cooling on engine hardware at engine temperatures and pressures can be exceedingly difficult and expensive. Thus, modern studies of film cooling are often performed at near ambient conditions. However, these studies are missing an important aspect in their characterization of film cooling effectiveness. Namely, they do not model effect of thermal property variations that occur within the boundary and film cooling layers at engine conditions. Also, turbine surfaces can experience significant radiative heat transfer that is not trivial to estimate analytically. The present research first computationally examines the effect of large temperature variations on a turbulent boundary layer. Subsequently, a method to model the effect of large temperature variations within a turbulent boundary layer in an environment coupled with significant radiative heat transfer is proposed and experimentally validated. Next, a method to scale turbine cooling from ambient to engine conditions via non-dimensional matching is developed computationally and the experimentally validated at combustion temperatures. Increasing engine efficiency and thrust to weight ratio demands have driven increased combustor fuel-air ratios. Increased fuel-air ratios increase the possibility of unburned fuel species entering the turbine. Alternatively, advanced ultra-compact combustor designs have been proposed to decrease combustor length, increase thrust, or generate power for directed energy weapons. However, the ultra-compact combustor design requires a film cooled vane within the combustor. In both these environments, the unburned fuel in the core flow encounters the oxidizer rich film cooling stream, combusts, and can locally heat the turbine surface rather than the intended cooling of the surface. Accordingly, a method to quantify film cooling performance in a fuel rich environment is prescribed. Finally, a method to film cool in a fuel rich environment is experimentally demonstrated.
Intraoperative computed tomography.
Tonn, J C; Schichor, C; Schnell, O; Zausinger, S; Uhl, E; Morhard, D; Reiser, M
2011-01-01
Intraoperative computed tomography (iCT) has gained increasing impact among modern neurosurgical techniques. Multislice CT with a sliding gantry in the OR provides excellent diagnostic image quality in the visualization of vascular lesions as well as bony structures including skull base and spine. Due to short acquisition times and a high spatial and temporal resolution, various modalities such as iCT-angiography, iCT-cerebral perfusion and the integration of intraoperative navigation with automatic re-registration after scanning can be performed. This allows a variety of applications, e.g. intraoperative angiography, intraoperative cerebral perfusion studies, update of cerebral and spinal navigation, stereotactic procedures as well as resection control in tumour surgery. Its versatility promotes its use in a multidisciplinary setting. Radiation exposure is comparable to standard CT systems outside the OR. For neurosurgical purposes, however, new hardware components (e.g. a radiolucent headholder system) had to be developed. Having a different range of applications compared to intraoperative MRI, it is an attractive modality for intraoperative imaging being comparatively easy to install and cost efficient.
[Case histories and other stories in the multimedia era... but was there something else?...].
Kohler, Hans-Peter
2010-12-01
The multimedia era is fascinating and limitless! The advances in the field of consumer and entertainment electronics are impressive. Our private as well as business life is very much accompanied and influenced by numerous technical gadgets. These advancements should have many positive implications on our clinical work. Do they? Using modern personal computers and clinic information systems (CIS) the access to patients' data, the prescription of drugs, or x-ray diagnostics on high-resolution laptop screens became as easy as pie. This affects the doctor-patient relationship but also the communication between physicians and nurses. Computers also affect our daily clinical work and therefore the treatment of our patients. Do our patients take benefit from these technical achievements? Does the use of clinic information systems positively influence diagnosis, treatment and finally patient outcome but also doctors' performances? What has really changed in the last couples of years? The present article tries to critically review these major changes in patient care.
Climate Analytics as a Service
NASA Technical Reports Server (NTRS)
Schnase, John L.; Duffy, Daniel Q.; McInerney, Mark A.; Webster, W. Phillip; Lee, Tsengdar J.
2014-01-01
Climate science is a big data domain that is experiencing unprecedented growth. In our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). CAaaS combines high-performance computing and data-proximal analytics with scalable data management, cloud computing virtualization, the notion of adaptive analytics, and a domain-harmonized API to improve the accessibility and usability of large collections of climate data. MERRA Analytic Services (MERRA/AS) provides an example of CAaaS. MERRA/AS enables MapReduce analytics over NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of key climate variables. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, CAaaS is providing the agility required to meet our customers' increasing and changing data management and data analysis needs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spentzouris, P.; /Fermilab; Cary, J.
The design and performance optimization of particle accelerators are essential for the success of the DOE scientific program in the next decade. Particle accelerators are very complex systems whose accurate description involves a large number of degrees of freedom and requires the inclusion of many physics processes. Building on the success of the SciDAC-1 Accelerator Science and Technology project, the SciDAC-2 Community Petascale Project for Accelerator Science and Simulation (ComPASS) is developing a comprehensive set of interoperable components for beam dynamics, electromagnetics, electron cooling, and laser/plasma acceleration modelling. ComPASS is providing accelerator scientists the tools required to enable the necessarymore » accelerator simulation paradigm shift from high-fidelity single physics process modeling (covered under SciDAC1) to high-fidelity multiphysics modeling. Our computational frameworks have been used to model the behavior of a large number of accelerators and accelerator R&D experiments, assisting both their design and performance optimization. As parallel computational applications, the ComPASS codes have been shown to make effective use of thousands of processors. ComPASS is in the first year of executing its plan to develop the next-generation HPC accelerator modeling tools. ComPASS aims to develop an integrated simulation environment that will utilize existing and new accelerator physics modules with petascale capabilities, by employing modern computing and solver technologies. The ComPASS vision is to deliver to accelerator scientists a virtual accelerator and virtual prototyping modeling environment, with the necessary multiphysics, multiscale capabilities. The plan for this development includes delivering accelerator modeling applications appropriate for each stage of the ComPASS software evolution. Such applications are already being used to address challenging problems in accelerator design and optimization. The ComPASS organization for software development and applications accounts for the natural domain areas (beam dynamics, electromagnetics, and advanced acceleration), and all areas depend on the enabling technologies activities, such as solvers and component technology, to deliver the desired performance and integrated simulation environment. The ComPASS applications focus on computationally challenging problems important for design or performance optimization to all major HEP, NP, and BES accelerator facilities. With the cost and complexity of particle accelerators rising, the use of computation to optimize their designs and find improved operating regimes becomes essential, potentially leading to significant cost savings with modest investment.« less
NASA Astrophysics Data System (ADS)
Leonard, T.; Spence, S.; Early, J.; Filsinger, D.
2013-12-01
Mixed flow turbines represent a potential solution to the increasing requirement for high pressure, low velocity ratio operation in turbocharger applications. While literature exists for the use of these turbines at such operating conditions, there is a lack of detailed design guidance for defining the basic geometry of the turbine, in particular, the cone angle - the angle at which the inlet of the mixed flow turbine is inclined to the axis. This investigates the effect and interaction of such mixed flow turbine design parameters. Computational Fluids Dynamics was initially used to investigate the performance of a modern radial turbine to create a baseline for subsequent mixed flow designs. Existing experimental data was used to validate this model. Using the CFD model, a number of mixed flow turbine designs were investigated. These included studies varying the cone angle and the associated inlet blade angle. The results of this analysis provide insight into the performance of a mixed flow turbine with respect to cone and inlet blade angle.
Update on Modern Management of Pheochromocytoma and Paraganglioma.
Lenders, Jacques W M; Eisenhofer, Graeme
2017-06-01
Despite all technical progress in modern diagnostic methods and treatment modalities of pheochromocytoma/paraganglioma, early consideration of the presence of these tumors remains the pivotal link towards the best possible outcome for patients. A timely diagnosis and proper treatment can prevent the wide variety of potentially catastrophic cardiovascular complications. Modern biochemical testing should include tests that offer the best available diagnostic performance, measurements of metanephrines and 3-methoxytyramine in plasma or urine. To minimize false-positive test results particular attention should be paid to pre-analytical sampling conditions. In addition to anatomical imaging by computed tomography (CT) or magnetic resonance imaging, new promising functional imaging modalities of photon emission tomography/CT using with somatostatin analogues such as ⁶⁸Ga-DOTATATE (⁶⁸Ga-labeled DOTA(0)-Tyr(3)-octreotide) will probably replace ¹²³I-MIBG (iodine-123-metaiodobenzylguanidine) in the near future. As nearly half of all pheochromocytoma patients harbor a mutation in one of the 14 tumor susceptibility genes, genetic testing and counseling should at least be considered in all patients with a proven tumor. Post-surgical annual follow-up of patients by measurements of plasma or urinary metanephrines should last for at least 10 years for timely detection of recurrent or metastatic disease. Patients with a high risk for recurrence or metastatic disease (paraganglioma, young age, multiple or large tumors, genetic background) should be followed up lifelong. Copyright © 2017 Korean Endocrine Society.
High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions
2016-08-30
High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.
2017-12-01
The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Double-negative metamaterial for mobile phone application
NASA Astrophysics Data System (ADS)
Hossain, M. I.; Faruque, M. R. I.; Islam, M. T.
2017-01-01
In this paper, a new design and analysis of metamaterial and its applications to modern handset are presented. The proposed metamaterial unit-cell design consists of two connected square spiral structures, which leads to increase the effective media ratio. The finite instigation technique based on Computer Simulation Technology Microwave Studio is utilized in this investigation, and the measurement is taken in an anechoic chamber. A good agreement is observed among simulated and measured results. The results indicate that the proposed metamaterial can successfully cover cellular phone frequency bands. Moreover, the uses of proposed metamaterial in modern handset antennas are also analyzed. The results reveal that the proposed metamaterial attachment significantly reduces specific absorption rate values without reducing the antenna performances.
NASA Astrophysics Data System (ADS)
Povarov, V. P.; Tereshchenko, A. B.; Kravchenko, Yu. N.; Pozychanyuk, I. V.; Gorobtsov, L. I.; Golubev, E. I.; Bykov, V. I.; Likhanskii, V. V.; Evdokimov, I. A.; Zborovskii, V. G.; Sorokin, A. A.; Kanyukova, V. D.; Aliev, T. N.
2014-02-01
The results of developing and implementing the modernized fuel leakage monitoring methods at the shut-down and running reactor of the Novovoronezh nuclear power plant (NPP) are presented. An automated computerized expert system integrated with an in-core monitoring system (ICMS) and installed at the Novovoronezh NPP unit no. 5 is described. If leaky fuel elements appear in the core, the system allows one to perform on-line assessment of the parameters of leaky fuel assemblies (FAs). The computer expert system units designed for optimizing the operating regimes and enhancing the fuel usage efficiency at the Novovoronezh NPP unit no. 5 are now being developed.
Turbulent heat transfer performance of single stage turbine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amano, R.S.; Song, B.
1999-07-01
To increase the efficiency and the power of modern power plant gas turbines, designers are continually trying to raise the maximum turbine inlet temperature. Here, a numerical study based on the Navier-Stokes equations on a three-dimensional turbulent flow in a single stage turbine stator/rotor passage has been conducted and reported in this paper. The full Reynolds-stress closure model (RSM) was used for the computations and the results were also compared with the computations made by using the Launder-Sharma low-Reynolds-number {kappa}-{epsilon} model. The computational results obtained using these models were compared in order to investigate the turbulence effect in the near-wallmore » region. The set of the governing equations in a generalized curvilinear coordinate system was discretized by using the finite volume method with non-staggered grids. The numerical modeling was performed to interact between the stator and rotor blades.« less
Fox, W Christopher; Park, Min S; Belverud, Shawn; Klugh, Arnett; Rivet, Dennis; Tomlin, Jeffrey M
2013-04-01
To follow the progression of neuroimaging as a means of non-invasive evaluation of mild traumatic brain injury (mTBI) in order to provide recommendations based on reproducible, defined imaging findings. A comprehensive literature review and analysis of contemporary published articles was performed to study the progression of neuroimaging findings as a non-invasive 'biomarker' for mTBI. Multiple imaging modalities exist to support the evaluation of patients with mTBI, including ultrasound (US), computed tomography (CT), single photon emission computed tomography (SPECT), positron emission tomography (PET), and magnetic resonance imaging (MRI). These techniques continue to evolve with the development of fractional anisotropy (FA), fiber tractography (FT), and diffusion tensor imaging (DTI). Modern imaging techniques, when applied in the appropriate clinical setting, may serve as a valuable tool for diagnosis and management of patients with mTBI. An understanding of modern neuroanatomical imaging will enhance our ability to analyse injury and recognize the manifestations of mTBI.
NASA Astrophysics Data System (ADS)
Lebedev, A. A.; Ivanova, E. G.; Komleva, V. A.; Klokov, N. M.; Komlev, A. A.
2017-01-01
The considered method of learning the basics of microelectronic circuits and systems amplifier enables one to understand electrical processes deeper, to understand the relationship between static and dynamic characteristics and, finally, bring the learning process to the cognitive process. The scheme of problem-based learning can be represented by the following sequence of procedures: the contradiction is perceived and revealed; the cognitive motivation is provided by creating a problematic situation (the mental state of the student), moving the desire to solve the problem, to raise the question "why?", the hypothesis is made; searches for solutions are implemented; the answer is looked for. Due to the complexity of architectural schemes in the work the modern methods of computer analysis and synthesis are considered in the work. Examples of engineering by students in the framework of students' scientific and research work of analog circuits with improved performance based on standard software and software developed at the Department of Microelectronics MEPhI.
Evolution of the SOFIA tracking control system
NASA Astrophysics Data System (ADS)
Fiebig, Norbert; Jakob, Holger; Pfüller, Enrico; Röser, Hans-Peter; Wiedemann, Manuel; Wolf, Jürgen
2014-07-01
The airborne observatory SOFIA (Stratospheric Observatory for Infrared Astronomy) is undergoing a modernization of its tracking system. This included new, highly sensitive tracking cameras, control computers, filter wheels and other equipment, as well as a major redesign of the control software. The experiences along the migration path from an aged 19" VMbus based control system to the application of modern industrial PCs, from VxWorks real-time operating system to embedded Linux and a state of the art software architecture are presented. Further, the concept is presented to operate the new camera also as a scientific instrument, in parallel to tracking.
Re-Computation of Numerical Results Contained in NACA Report No. 496
NASA Technical Reports Server (NTRS)
Perry, Boyd, III
2015-01-01
An extensive examination of NACA Report No. 496 (NACA 496), "General Theory of Aerodynamic Instability and the Mechanism of Flutter," by Theodore Theodorsen, is described. The examination included checking equations and solution methods and re-computing interim quantities and all numerical examples in NACA 496. The checks revealed that NACA 496 contains computational shortcuts (time- and effort-saving devices for engineers of the time) and clever artifices (employed in its solution methods), but, unfortunately, also contains numerous tripping points (aspects of NACA 496 that have the potential to cause confusion) and some errors. The re-computations were performed employing the methods and procedures described in NACA 496, but using modern computational tools. With some exceptions, the magnitudes and trends of the original results were in fair-to-very-good agreement with the re-computed results. The exceptions included what are speculated to be computational errors in the original in some instances and transcription errors in the original in others. Independent flutter calculations were performed and, in all cases, including those where the original and re-computed results differed significantly, were in excellent agreement with the re-computed results. Appendix A contains NACA 496; Appendix B contains a Matlab(Reistered) program that performs the re-computation of results; Appendix C presents three alternate solution methods, with examples, for the two-degree-of-freedom solution method of NACA 496; Appendix D contains the three-degree-of-freedom solution method (outlined in NACA 496 but never implemented), with examples.
Living Color Frame System: PC graphics tool for data visualization
NASA Technical Reports Server (NTRS)
Truong, Long V.
1993-01-01
Living Color Frame System (LCFS) is a personal computer software tool for generating real-time graphics applications. It is highly applicable for a wide range of data visualization in virtual environment applications. Engineers often use computer graphics to enhance the interpretation of data under observation. These graphics become more complicated when 'run time' animations are required, such as found in many typical modern artificial intelligence and expert systems. Living Color Frame System solves many of these real-time graphics problems.
GPU applications for data processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vladymyrov, Mykhailo, E-mail: mykhailo.vladymyrov@cern.ch; Aleksandrov, Andrey; INFN sezione di Napoli, I-80125 Napoli
2015-12-31
Modern experiments that use nuclear photoemulsion imply fast and efficient data acquisition from the emulsion can be performed. The new approaches in developing scanning systems require real-time processing of large amount of data. Methods that use Graphical Processing Unit (GPU) computing power for emulsion data processing are presented here. It is shown how the GPU-accelerated emulsion processing helped us to rise the scanning speed by factor of nine.
Fast Computation on the Modern Battlefield
2015-04-01
the performance of offloading systems in current and future scenarios. The modularity of this model allows system designers to replace model...goals were simplicity and modularity . We wanted the model to not necessarily answer every question for every scenario, but rather expose easy to...acquisitions for future systems. Again, because of the modularity of the model, it is possible for designers to substitute the most accurate value for
Analysis of Satellite Communications Antenna Patterns
NASA Technical Reports Server (NTRS)
Rahmat-Samii, Y.
1985-01-01
Computer program accurately and efficiently predicts far-field patterns of offset, or symmetric, parabolic reflector antennas. Antenna designer uses program to study effects of varying geometrical and electrical (RF) parameters of parabolic reflector and its feed system. Accurate predictions of far-field patterns help designer predict overall performance of antenna. These reflectors used extensively in modern communications satellites and in multiple-beam and low side-lobe antenna systems.
Airborne Network Optimization with Dynamic Network Update
2015-03-26
Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Institute of Technology Air University...Member Dr. Barry E. Mullins Member AFIT-ENG-MS-15-M-030 Abstract Modern networks employ congestion and routing management algorithms that can perform...airborne networks. Intelligent agents can make use of Kalman filter predictions to make informed decisions to manage communication in airborne networks. The
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations
NASA Astrophysics Data System (ADS)
Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I.; Strydis, Christos
2017-12-01
Objective. The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. Approach. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload’s performance characteristics. Main results. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. Significance. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations.
Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I; Strydis, Christos
2017-12-01
The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload's performance characteristics. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
NASA Astrophysics Data System (ADS)
Supurwoko; Cari; Sarwanto; Sukarmin; Fauzi, Ahmad; Faradilla, Lisa; Summa Dewi, Tiarasita
2017-11-01
The process of learning and teaching in Physics is often confronted with abstract concepts. It makes difficulty for students to understand and teachers to teach the concept. One of the materials that has an abstract concept is Compton Effect. The purpose of this research is to evaluate computer simulation model on Compton Effect material which is used to improve high thinking ability of Physics teacher candidate students. This research is a case study. The subject is students at physics educations who have attended Modern Physics lectures. Data were obtained through essay test for measuring students’ high-order thinking skills and quisioners for measuring students’ responses. The results obtained indicate that computer simulation model can be used to improve students’ high order thinking skill and can be used to improve students’ responses. With this result it is suggested that the audiences use the simulation media in learning
Automatic Blocking Of QR and LU Factorizations for Locality
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yi, Q; Kennedy, K; You, H
2004-03-26
QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, moremore » benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.« less
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Summers, R.M.; Wong, M.K.; Boucheron, E.A.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
ERIC Educational Resources Information Center
Federal Coordinating Council for Science, Engineering and Technology, Washington, DC.
This report presents a review of the High Performance Computing and Communications (HPCC) Program, which has as its goal the acceleration of the commercial availability and utilization of the next generation of high performance computers and networks in order to: (1) extend U.S. technological leadership in high performance computing and computer…
Sokiranski, R; Pirsig, W; Nerlich, A
2005-03-01
A still-born male fetus from the 19th century, fixed in formalin and presenting as diprosopia triophthalmica, was analysed by helical computer tomography and virtually reconstructed without damage. This rare, incomplete, symmetrical duplication of the face on a single head with three eyes, two noses and two mouths develops in the first 3 weeks of gestation and is a subset of the category of conjoined twins with unknown underlying etiology. Spiral computer tomography of fixed tissue demonstrated in the more than 100 year old specimen that virtual reconstruction can be performed in nearly the same way as in patients (contrast medium application not possible). The radiological reconstruction of the Munich fetus, here confined to head and neck data, is the basis for comparison with a number of imaging procedures of the last 3000 years. Starting with some Neolithic Mesoamerican ceramics, the "Pretty Ladies of Tlatilco", diprosopia triophthalmica was also depicted on engravings of the 16th and 17th century A.D. by artists as well as by the anatomist Soemmering and his engraver Berndt in the 18th century. Our modern spiral computer tomography confirms the ability of our ancestors to depict diprosopia triophthalmica in paintings and sculptures with a high level of natural precision.
NASA Technical Reports Server (NTRS)
Chen, Shu-Cheng S.
2017-01-01
A Computational Fluid Dynamic (CFD) investigation is conducted over a two-dimensional axial-flow turbine rotor blade row to study the phenomena of turbine rotor discharge flow overexpansion at subcritical, critical, and supercritical conditions. Quantitative data of the mean-flow Mach numbers, mean-flow angles, the tangential blade pressure forces, the mean-flow mass flux, and the flow-path total pressure loss coefficients, averaged or integrated across the two-dimensional computational domain encompassing two blade-passages, are obtained over a series of 14 inlet-total to exit-static pressure ratios, from 1.5 (un-choked; subcritical condition) to 10.0 (supercritical with excessively high pressure ratio.) Detailed flow features over the full domain-of-computation, such as the streamline patterns, Mach contours, pressure contours, blade surface pressure distributions, etc. are collected and displayed in this paper. A formal, quantitative definition of the limit loading condition based on the channel flow theory is proposed and explained. Contrary to the comments made in the historical works performed on this subject, about the deficiency of the theoretical methods applied in analyzing this phenomena, using modern CFD method for the study of this subject appears to be quite adequate and successful. This paper describes the CFD work and its findings.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
An Integrated Data-Driven Strategy for Safe-by-Design Nanoparticles: The FP7 MODERN Project.
Brehm, Martin; Kafka, Alexander; Bamler, Markus; Kühne, Ralph; Schüürmann, Gerrit; Sikk, Lauri; Burk, Jaanus; Burk, Peeter; Tamm, Tarmo; Tämm, Kaido; Pokhrel, Suman; Mädler, Lutz; Kahru, Anne; Aruoja, Villem; Sihtmäe, Mariliis; Scott-Fordsmand, Janeck; Sorensen, Peter B; Escorihuela, Laura; Roca, Carlos P; Fernández, Alberto; Giralt, Francesc; Rallo, Robert
2017-01-01
The development and implementation of safe-by-design strategies is key for the safe development of future generations of nanotechnology enabled products. The safety testing of the huge variety of nanomaterials that can be synthetized is unfeasible due to time and cost constraints. Computational modeling facilitates the implementation of alternative testing strategies in a time and cost effective way. The development of predictive nanotoxicology models requires the use of high quality experimental data on the structure, physicochemical properties and bioactivity of nanomaterials. The FP7 Project MODERN has developed and evaluated the main components of a computational framework for the evaluation of the environmental and health impacts of nanoparticles. This chapter describes each of the elements of the framework including aspects related to data generation, management and integration; development of nanodescriptors; establishment of nanostructure-activity relationships; identification of nanoparticle categories; hazard ranking and risk assessment.
Bioinformatics in high school biology curricula: a study of state science standards.
Wefer, Stephen H; Sheppard, Keith
2008-01-01
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.
Development of Fuel Shuffling Module for PHISICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allan Mabe; Andrea Alfonsi; Cristian Rabiti
2013-06-01
PHISICS (Parallel and Highly Innovative Simulation for the INL Code System) [4] code toolkit has been in development at the Idaho National Laboratory. This package is intended to provide a modern analysis tool for reactor physics investigation. It is designed with the mindset to maximize accuracy for a given availability of computational resources and to give state of the art tools to the modern nuclear engineer. This is obtained by implementing several different algorithms and meshing approaches among which the user will be able to choose, in order to optimize his computational resources and accuracy needs. The software is completelymore » modular in order to simplify the independent development of modules by different teams and future maintenance. The package is coupled with the thermo-hydraulic code RELAP5-3D [3]. In the following the structure of the different PHISICS modules is briefly recalled, focusing on the new shuffling module (SHUFFLE), object of this paper.« less
Bioinformatics in High School Biology Curricula: A Study of State Science Standards
Sheppard, Keith
2008-01-01
The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students. PMID:18316818
Operational forecasting with the subgrid technique on the Elbe Estuary
NASA Astrophysics Data System (ADS)
Sehili, Aissa
2017-04-01
Modern remote sensing technologies can deliver very detailed land surface height data that should be considered for more accurate simulations. In that case, and even if some compromise is made with regard to grid resolution of an unstructured grid, simulations still will require large grids which can be computationally very demanding. The subgrid technique, first published by Casulli (2009), is based on the idea of making use of the available detailed subgrid bathymetric information while performing computations on relatively coarse grids permitting large time steps. Consequently, accuracy and efficiency are drastically enhanced if compared to the classical linear method, where the underlying bathymetry is solely discretized by the computational grid. The algorithm guarantees rigorous mass conservation and nonnegative water depths for any time step size. Computational grid-cells are permitted to be wet, partially wet or dry and no drying threshold is needed. The subgrid technique is used in an operational forecast model for water level, current velocity, salinity and temperature of the Elbe estuary in Germany. Comparison is performed with the comparatively highly resolved classical unstructured grid model UnTRIM. The daily meteorological forcing data are delivered by the German Weather Service (DWD) using the ICON-EU model. Open boundary data are delivered by the coastal model BSHcmod of the German Federal Maritime and Hydrographic Agency (BSH). Comparison of predicted water levels between classical and subgrid model shows a very good agreement. The speedup in computational performance due to the use of the subgrid technique is about a factor of 20. A typical daily forecast can be carried out within less than 10 minutes on standard PC-like hardware. The model is capable of permanently delivering highly resolved temporal and spatial information on water level, current velocity, salinity and temperature for the whole estuary. The model offers also the possibility to recalculate any previous situation. This can be helpful to figure out for instance the context in which a certain event occurred like an accident. In addition to measurement, the model can be used to improve navigability by adjusting the tidal transit-schedule for container vessels that are depending on the tide to approach or leave the port of Hamburg.
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.
Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan
2017-06-24
The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .
NASA Astrophysics Data System (ADS)
Arts, T.; Lambertderouvroit, M.; Rutherford, A. W.
1990-09-01
An experimental aerothermal investigation of a highly loaded transonic turbine nozzle guide vane mounted in a linear cascade arrangement is presented. The measurements were performed in a short duration isentropic light piston compression tube facility, allowing a correct simulation of Mach and Reynolds numbers as well as of the gas to wall temperature ratio compared to the values currently observed in modern aeroengines. The experimental program consisted of the following: (1) flow periodicity checks by means of wall static pressure measurements and Schlieren flow visualizations; (2) blade velocity distribution measurements by means of static pressure tappings; (3) blade convective heat transfer measurements by means of static pressure tappings; (4) blade convective heat transfer measurements by means of platinium thin films; (5) downstream loss coefficient and exit flow angle determinations by using a new fast traversing mechanism; and (6) free stream turbulence intensity and spectrum measurements. These different measurements were performed for several combinations of the free stream flow parameters looking at the relative effects on the aerodynamic blade performance and blade convective heat transfer of Mach number, Reynolds number, and freestream turbulence intensity.
Facilities | Integrated Energy Solutions | NREL
strategies needed to optimize our entire energy system. A photo of the high-performance computer at NREL . High-Performance Computing Data Center High-performance computing facilities at NREL provide high-speed
Ghoshhajra, Brian B; Sidhu, Manavjot S; El-Sherief, Ahmed; Rojas, Carlos; Yeh, Doreen Defaria; Engel, Leif-Christopher; Liberthson, Richard; Abbara, Suhny; Bhatt, Ami
2012-01-01
Adult congenital heart disease patients present a unique challenge to the cardiac imager. Patients may present with both acute and chronic manifestations of their complex congenital heart disease and also require surveillance for sequelae of their medical and surgical interventions. Multimodality imaging is often required to clarify their anatomy and physiology. Radiation dose is of particular concern in these patients with lifelong imaging needs for their chronic disease. The second-generation dual-source scanner is a recently available advanced clinical cardiac computed tomography (CT) scanner. It offers a combination of the high-spatial resolution of modern CT, the high-temporal resolution of dual-source technology, and the wide z-axis coverage of modern cone-beam geometry CT scanners. These advances in technology allow novel protocols that markedly reduce scan time, significantly reduce radiation exposure, and expand the physiologic imaging capabilities of cardiac CT. We present a case series of complicated adult congenital heart disease patients imaged by the second-generation dual-source CT scanner with extremely low-radiation doses and excellent image quality. © 2012 Wiley Periodicals, Inc.
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
Wind Tunnel Model Design for Sonic Boom Studies of Nozzle Jet Flows with Shock Interactions
NASA Technical Reports Server (NTRS)
Cliff, Susan E.; Denison, Marie; Moini-Yekta, Shayan; Morr, Donald E.; Durston, Donald A.
2016-01-01
NASA and the U.S. aerospace industry are performing studies of supersonic aircraft concepts with low sonic boom pressure signatures. The computational analyses of modern aircraft designs have matured to the point where there is confidence in the prediction of the pressure signature from the front of the vehicle, but uncertainty remains in the aft signatures due to boundary layer and nozzle exhaust jet effects. Wind tunnel testing without inlet and nozzle exhaust jet effects at lower Reynolds numbers than in-flight make it difficult to accurately assess the computational solutions of flight vehicles. A wind tunnel test in the NASA Ames 9- by 7-Foot Supersonic Wind Tunnel is planned for February 2016 to address the nozzle jet effects on sonic boom. The experiment will provide pressure signatures of test articles that replicate waveforms from aircraft wings, tails, and aft fuselage (deck) components after passing through cold nozzle jet plumes. The data will provide a variety of nozzle plume and shock interactions for comparison with computational results. A large number of high-fidelity numerical simulations of a variety of shock generators were evaluated to define a reduced collection of suitable test models. The computational results of the candidate wind tunnel test models as they evolved are summarized, and pre-test computations of the final designs are provided.
NASA Astrophysics Data System (ADS)
Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew
2007-04-01
One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Speeding Up Geophysical Research Using Docker Containers Within Multi-Cloud Environment.
NASA Astrophysics Data System (ADS)
Synytsky, R.; Henadiy, S.; Lobzakov, V.; Kolesnikov, L.; Starovoit, Y. O.
2016-12-01
How useful are the geophysical observations in a scope of minimizing losses from natural disasters today? Does it help to decrease number of human victims during tsunami and earthquake? Unfortunately it's still at early stage these days. It's a big goal and achievement to make such observations more useful by improving early warning and prediction systems with the help of cloud computing. Cloud computing technologies have proved the ability to speed up application development in many areas for 10 years already. Cloud unlocks new opportunities for geoscientists by providing access to modern data processing tools and algorithms including real-time high-performance computing, big data processing, artificial intelligence and others. Emerging lightweight cloud technologies, such as Docker containers, are gaining wide traction in IT due to the fact of faster and more efficient deployment of different applications in a cloud environment. It allows to deploy and manage geophysical applications and systems in minutes across multiple clouds and data centers that becomes of utmost importance for the next generation applications. In this session we'll demonstrate how Docker containers technology within multi-cloud can accelerate the development of applications specifically designed for geophysical researches.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers
NASA Technical Reports Server (NTRS)
Guruswamy, Guru; VanDalsem, William (Technical Monitor)
1994-01-01
Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
High-performance 3D compressive sensing MRI reconstruction.
Kim, Daehyun; Trzasko, Joshua D; Smelyanskiy, Mikhail; Haider, Clifton R; Manduca, Armando; Dubey, Pradeep
2010-01-01
Compressive Sensing (CS) is a nascent sampling and reconstruction paradigm that describes how sparse or compressible signals can be accurately approximated using many fewer samples than traditionally believed. In magnetic resonance imaging (MRI), where scan duration is directly proportional to the number of acquired samples, CS has the potential to dramatically decrease scan time. However, the computationally expensive nature of CS reconstructions has so far precluded their use in routine clinical practice - instead, more-easily generated but lower-quality images continue to be used. We investigate the development and optimization of a proven inexact quasi-Newton CS reconstruction algorithm on several modern parallel architectures, including CPUs, GPUs, and Intel's Many Integrated Core (MIC) architecture. Our (optimized) baseline implementation on a quad-core Core i7 is able to reconstruct a 256 × 160×80 volume of the neurovasculature from an 8-channel, 10 × undersampled data set within 56 seconds, which is already a significant improvement over existing implementations. The latest six-core Core i7 reduces the reconstruction time further to 32 seconds. Moreover, we show that the CS algorithm benefits from modern throughput-oriented architectures. Specifically, our CUDA-base implementation on NVIDIA GTX480 reconstructs the same dataset in 16 seconds, while Intel's Knights Ferry (KNF) of the MIC architecture even reduces the time to 12 seconds. Such level of performance allows the neurovascular dataset to be reconstructed within a clinically viable time.
Securing Secrets and Managing Trust in Modern Computing Applications
ERIC Educational Resources Information Center
Sayler, Andy
2016-01-01
The amount of digital data generated and stored by users increases every day. In order to protect this data, modern computing systems employ numerous cryptographic and access control solutions. Almost all of such solutions, however, require the keeping of certain secrets as the basis of their security models. How best to securely store and control…
ERIC Educational Resources Information Center
Zhamanov, Azamat; Yoo, Seong-Moo; Sakhiyeva, Zhulduz; Zhaparov, Meirambek
2018-01-01
Students nowadays are hard to be motivated to study lessons with traditional teaching methods. Computers, smartphones, tablets and other smart devices disturb students' attentions. Nevertheless, those smart devices can be used as auxiliary tools of modern teaching methods. In this article, the authors review two popular modern teaching methods:…
Small Microprocessor for ASIC or FPGA Implementation
NASA Technical Reports Server (NTRS)
Kleyner, Igor; Katz, Richard; Blair-Smith, Hugh
2011-01-01
A small microprocessor, suitable for use in applications in which high reliability is required, was designed to be implemented in either an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). The design is based on commercial microprocessor architecture, making it possible to use available software development tools and thereby to implement the microprocessor at relatively low cost. The design features enhancements, including trapping during execution of illegal instructions. The internal structure of the design yields relatively high performance, with a significant decrease, relative to other microprocessors that perform the same functions, in the number of microcycles needed to execute macroinstructions. The problem meant to be solved in designing this microprocessor was to provide a modest level of computational capability in a general-purpose processor while adding as little as possible to the power demand, size, and weight of a system into which the microprocessor would be incorporated. As designed, this microprocessor consumes very little power and occupies only a small portion of a typical modern ASIC or FPGA. The microprocessor operates at a rate of about 4 million instructions per second with clock frequency of 20 MHz.
In Touch With Industry: ICAF Industry Studies, 1997
1997-01-01
Society of Civil Engineers, Washington, DC. . 1994. "Materials for Tomorrow’s Infrastructure: A Ten-Year Plan for Deploying High - Performance ...identified high - performance electronics as a key to modern warfare and conflict prevention. Clearly, the nation’s defense strategy relies heavily on...priced, high performance systems. As a consequence, hardware makers have undergone multiple restructures, consolidations, mergers, and global
Jiang, Hanyu; Ganesan, Narayan
2016-02-27
HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.
Multiple Scattering Principal Component-based Radiative Transfer Model (PCRTM) from Far IR to UV-Vis
NASA Astrophysics Data System (ADS)
Liu, X.; Wu, W.; Yang, Q.
2017-12-01
Modern satellite hyperspectral satellite remote sensors such as AIRS, CrIS, IASI, CLARREO all require accurate and fast radiative transfer models that can deal with multiple scattering of clouds and aerosols to explore the information contents. However, performing full radiative transfer calculations using multiple stream methods such as discrete ordinate (DISORT), doubling and adding (AD), successive order of scattering order of scattering (SOS) are very time consuming. We have developed a principal component-based radiative transfer model (PCRTM) to reduce the computational burden by orders of magnitudes while maintain high accuracy. By exploring spectral correlations, the PCRTM reduce the number of radiative transfer calculations in frequency domain. It further uses a hybrid stream method to decrease the number of calls to the computational expensive multiple scattering calculations with high stream numbers. Other fast parameterizations have been used in the infrared spectral region reduce the computational time to milliseconds for an AIRS forward simulation (2378 spectral channels). The PCRTM has been development to cover spectral range from far IR to UV-Vis. The PCRTM model have been be used for satellite data inversions, proxy data generation, inter-satellite calibrations, spectral fingerprinting, and climate OSSE. We will show examples of applying the PCRTM to single field of view cloudy retrievals of atmospheric temperature, moisture, traces gases, clouds, and surface parameters. We will also show how the PCRTM are used for the NASA CLARREO project.
Large eddy simulations and direct numerical simulations of high speed turbulent reacting flows
NASA Technical Reports Server (NTRS)
Givi, P.; Madnia, C. K.; Steinberger, C. J.; Frankel, S. H.
1992-01-01
The basic objective of this research is to extend the capabilities of Large Eddy Simulations (LES) and Direct Numerical Simulations (DNS) for the computational analyses of high speed reacting flows. In the efforts related to LES, we were primarily involved with assessing the performance of the various modern methods based on the Probability Density Function (PDF) methods for providing closures for treating the subgrid fluctuation correlations of scalar quantities in reacting turbulent flows. In the work on DNS, we concentrated on understanding some of the relevant physics of compressible reacting flows by means of statistical analysis of the data generated by DNS of such flows. In the research conducted in the second year of this program, our efforts focused on the modeling of homogeneous compressible turbulent flows by PDF methods, and on DNS of non-equilibrium reacting high speed mixing layers. Some preliminary work is also in progress on PDF modeling of shear flows, and also on LES of such flows.
Software designs of image processing tasks with incremental refinement of computation.
Anastasia, Davide; Andreopoulos, Yiannis
2010-08-01
Software realizations of computationally-demanding image processing tasks (e.g., image transforms and convolution) do not currently provide graceful degradation when their clock-cycles budgets are reduced, e.g., when delay deadlines are imposed in a multitasking environment to meet throughput requirements. This is an important obstacle in the quest for full utilization of modern programmable platforms' capabilities since worst-case considerations must be in place for reasonable quality of results. In this paper, we propose (and make available online) platform-independent software designs performing bitplane-based computation combined with an incremental packing framework in order to realize block transforms, 2-D convolution and frame-by-frame block matching. The proposed framework realizes incremental computation: progressive processing of input-source increments improves the output quality monotonically. Comparisons with the equivalent nonincremental software realization of each algorithm reveal that, for the same precision of the result, the proposed approach can lead to comparable or faster execution, while it can be arbitrarily terminated and provide the result up to the computed precision. Application examples with region-of-interest based incremental computation, task scheduling per frame, and energy-distortion scalability verify that our proposal provides significant performance scalability with graceful degradation.
Kullgren, A; Lie, A; Tingvall, C
1994-02-01
Vehicle deformations are important sources for information about the performance of safety systems. Photogrammetry has developed vastly under recent years. In this study modern photogrammetrical methods have been used for vehicle deformation analysis. The study describes the equipment for documentation and recording in the field (semi-metric camera), and a system for photogrammetrical measurements of the images in laboratory environment (personal computer and digitizing tablet). The material used is approximately 500 collected and measured cases. The study shows that the reliability is high and that accuracies around 15mm can be achieved even if the equipment and routines used are relatively simple. The effects of further development using video cameras for data capture and digital images for measurements are discussed.
Web-based encyclopedia on physical effects
NASA Astrophysics Data System (ADS)
Papliatseyeu, Andrey; Repich, Maryna; Ilyushonak, Boris; Hurbo, Aliaksandr; Makarava, Katerina; Lutkovski, Vladimir M.
2004-07-01
Web-based learning applications open new horizons for educators. In this work we present the computer encyclopedia designed to overcome drawbacks of traditional paper information sources such as awkward search, low update rate, limited copies count and high cost. Moreover, we intended to improve access and search functions in comparison with some Internet sources in order to make it more convenient. The system is developed using modern Java technologies (Jave Servlets, Java Server Pages) and contains systemized information about most important and explored physical effects. It also may be used in other fields of science. The system is accessible via Intranet/Internet networks by means of any up-to-date Internet browser. It may be used for general learning purposes and as a study guide or tutorial for performing laboratory works.
Freiberger, Manuel; Egger, Herbert; Liebmann, Manfred; Scharfetter, Hermann
2011-11-01
Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Nai-Yuan; Zavala, Victor M.
We present a filter line-search algorithm that does not require inertia information of the linear system. This feature enables the use of a wide range of linear algebra strategies and libraries, which is essential to tackle large-scale problems on modern computing architectures. The proposed approach performs curvature tests along the search step to detect negative curvature and to trigger convexification. We prove that the approach is globally convergent and we implement the approach within a parallel interior-point framework to solve large-scale and highly nonlinear problems. Our numerical tests demonstrate that the inertia-free approach is as efficient as inertia detection viamore » symmetric indefinite factorizations. We also demonstrate that the inertia-free approach can lead to reductions in solution time because it reduces the amount of convexification needed.« less
Reliability based design optimization: Formulations and methodologies
NASA Astrophysics Data System (ADS)
Agarwal, Harish
Modern products ranging from simple components to complex systems should be designed to be optimal and reliable. The challenge of modern engineering is to ensure that manufacturing costs are reduced and design cycle times are minimized while achieving requirements for performance and reliability. If the market for the product is competitive, improved quality and reliability can generate very strong competitive advantages. Simulation based design plays an important role in designing almost any kind of automotive, aerospace, and consumer products under these competitive conditions. Single discipline simulations used for analysis are being coupled together to create complex coupled simulation tools. This investigation focuses on the development of efficient and robust methodologies for reliability based design optimization in a simulation based design environment. Original contributions of this research are the development of a novel efficient and robust unilevel methodology for reliability based design optimization, the development of an innovative decoupled reliability based design optimization methodology, the application of homotopy techniques in unilevel reliability based design optimization methodology, and the development of a new framework for reliability based design optimization under epistemic uncertainty. The unilevel methodology for reliability based design optimization is shown to be mathematically equivalent to the traditional nested formulation. Numerical test problems show that the unilevel methodology can reduce computational cost by at least 50% as compared to the nested approach. The decoupled reliability based design optimization methodology is an approximate technique to obtain consistent reliable designs at lesser computational expense. Test problems show that the methodology is computationally efficient compared to the nested approach. A framework for performing reliability based design optimization under epistemic uncertainty is also developed. A trust region managed sequential approximate optimization methodology is employed for this purpose. Results from numerical test studies indicate that the methodology can be used for performing design optimization under severe uncertainty.
NASA Technical Reports Server (NTRS)
Goodrich, Kenneth H.; McManus, John W.; Chappell, Alan R.
1992-01-01
A batch air combat simulation environment known as the Tactical Maneuvering Simulator (TMS) is presented. The TMS serves as a tool for developing and evaluating tactical maneuvering logics. The environment can also be used to evaluate the tactical implications of perturbations to aircraft performance or supporting systems. The TMS is capable of simulating air combat between any number of engagement participants, with practical limits imposed by computer memory and processing power. Aircraft are modeled using equations of motion, control laws, aerodynamics and propulsive characteristics equivalent to those used in high-fidelity piloted simulation. Databases representative of a modern high-performance aircraft with and without thrust-vectoring capability are included. To simplify the task of developing and implementing maneuvering logics in the TMS, an outer-loop control system known as the Tactical Autopilot (TA) is implemented in the aircraft simulation model. The TA converts guidance commands issued by computerized maneuvering logics in the form of desired angle-of-attack and wind axis-bank angle into inputs to the inner-loop control augmentation system of the aircraft. This report describes the capabilities and operation of the TMS.
Numerical Propulsion System Simulation: An Overview
NASA Technical Reports Server (NTRS)
Lytle, John K.
2000-01-01
The cost of implementing new technology in aerospace propulsion systems is becoming prohibitively expensive and time consuming. One of the main contributors to the high cost and lengthy time is the need to perform many large-scale hardware tests and the inability to integrate all appropriate subsystems early in the design process. The NASA Glenn Research Center is developing the technologies required to enable simulations of full aerospace propulsion systems in sufficient detail to resolve critical design issues early in the design process before hardware is built. This concept, called the Numerical Propulsion System Simulation (NPSS), is focused on the integration of multiple disciplines such as aerodynamics, structures and heat transfer with computing and communication technologies to capture complex physical processes in a timely and cost-effective manner. The vision for NPSS, as illustrated, is to be a "numerical test cell" that enables full engine simulation overnight on cost-effective computing platforms. There are several key elements within NPSS that are required to achieve this capability: 1) clear data interfaces through the development and/or use of data exchange standards, 2) modular and flexible program construction through the use of object-oriented programming, 3) integrated multiple fidelity analysis (zooming) techniques that capture the appropriate physics at the appropriate fidelity for the engine systems, 4) multidisciplinary coupling techniques and finally 5) high performance parallel and distributed computing. The current state of development in these five area focuses on air breathing gas turbine engines and is reported in this paper. However, many of the technologies are generic and can be readily applied to rocket based systems and combined cycles currently being considered for low-cost access-to-space applications. Recent accomplishments include: (1) the development of an industry-standard engine cycle analysis program and plug 'n play architecture, called NPSS Version 1, (2) A full engine simulation that combines a 3D low-pressure subsystem with a 0D high pressure core simulation. This demonstrates the ability to integrate analyses at different levels of detail and to aerodynamically couple components, the fan/booster and low-pressure turbine, through a 3D computational fluid dynamics simulation. (3) Simulation of all of the turbomachinery in a modern turbofan engine on parallel computing platform for rapid and cost-effective execution. This capability can also be used to generate full compressor map, requiring both design and off-design simulation. (4) Three levels of coupling characterize the multidisciplinary analysis under NPSS: loosely coupled, process coupled and tightly coupled. The loosely coupled and process coupled approaches require a common geometry definition to link CAD to analysis tools. The tightly coupled approach is currently validating the use of arbitrary Lagrangian/Eulerian formulation for rotating turbomachinery. The validation includes both centrifugal and axial compression systems. The results of the validation will be reported in the paper. (5) The demonstration of significant computing cost/performance reduction for turbine engine applications using PC clusters. The NPSS Project is supported under the NASA High Performance Computing and Communications Program.
Advances in electron kinetics and theory of gas discharges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolobov, Vladimir I.; The University of Alabama in Huntsville, Huntsville, Alabama 35899
2013-10-15
“Electrons, like people, are fertile and infertile: high-energy electrons are fertile and able to reproduce.”—Lev TsendinModern physics of gas discharges increasingly uses physical kinetics for analysis of non-equilibrium plasmas. The description of underlying physics at the kinetic level appears to be important for plasma applications in modern technologies. In this paper, we attempt to grasp the legacy of Professor Lev Tsendin, who advocated the use of the kinetic approach for understanding fundamental problems of gas discharges. We outline the fundamentals of electron kinetics in low-temperature plasmas, describe elements of the modern kinetic theory of gas discharges, and show examples ofmore » the theoretical approach to gas discharge problems used by Lev Tsendin. Important connections between electron kinetics in gas discharges and semiconductors are also discussed. Using several examples, we illustrate how Tsendin's ideas and methods are currently being developed for the implementation of next generation computational tools for adaptive kinetic-fluid simulations of gas discharges used in modern technologies.« less
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
High-Performance Computing Data Center Warm-Water Liquid Cooling |
Computational Science | NREL Warm-Water Liquid Cooling High-Performance Computing Data Center Warm-Water Liquid Cooling NREL's High-Performance Computing Data Center (HPC Data Center) is liquid water Liquid cooling technologies offer a more energy-efficient solution that also allows for effective
NASA Astrophysics Data System (ADS)
Ahn, Sul-Ah; Jung, Youngim
2016-10-01
The research activities of the computational physicists utilizing high performance computing are analyzed by bibliometirc approaches. This study aims at providing the computational physicists utilizing high-performance computing and policy planners with useful bibliometric results for an assessment of research activities. In order to achieve this purpose, we carried out a co-authorship network analysis of journal articles to assess the research activities of researchers for high-performance computational physics as a case study. For this study, we used journal articles of the Scopus database from Elsevier covering the time period of 2004-2013. We extracted the author rank in the physics field utilizing high-performance computing by the number of papers published during ten years from 2004. Finally, we drew the co-authorship network for 45 top-authors and their coauthors, and described some features of the co-authorship network in relation to the author rank. Suggestions for further studies are discussed.
NASA Astrophysics Data System (ADS)
Greitzer, E. M.; Tan, C. S.; Graf, M. B.
2004-06-01
Focusing on phenomena important in implementing the performance of a broad range of fluid devices, this work describes the behavior of internal flows encountered in propulsion systems, fluid machinery (compressors, turbines, and pumps) and ducts (diffusers, nozzles and combustion chambers). The book equips students and practicing engineers with a range of new analytical tools. These tools offer enhanced interpretation and application of both experimental measurements and the computational procedures that characterize modern fluids engineering.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-02-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Technical Reports Server (NTRS)
Kamat, Manohar P.; Watson, Brian C.
1992-01-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almgren, Ann; DeMar, Phil; Vetter, Jeffrey
The widespread use of computing in the American economy would not be possible without a thoughtful, exploratory research and development (R&D) community pushing the performance edge of operating systems, computer languages, and software libraries. These are the tools and building blocks — the hammers, chisels, bricks, and mortar — of the smartphone, the cloud, and the computing services on which we rely. Engineers and scientists need ever-more specialized computing tools to discover new material properties for manufacturing, make energy generation safer and more efficient, and provide insight into the fundamentals of the universe, for example. The research division of themore » U.S. Department of Energy’s (DOE’s) Office of Advanced Scientific Computing and Research (ASCR Research) ensures that these tools and building blocks are being developed and honed to meet the extreme needs of modern science. See also http://exascaleage.org/ascr/ for additional information.« less
The Analog Revolution and Its On-Going Role in Modern Analytical Measurements.
Enke, Christie G
2015-12-15
The electronic revolution in analytical instrumentation began when we first exceeded the two-digit resolution of panel meters and chart recorders and then took the first steps into automated control. It started with the first uses of operational amplifiers (op amps) in the analog domain 20 years before the digital computer entered the analytical lab. Their application greatly increased both accuracy and precision in chemical measurement and they provided an elegant means for the electronic control of experimental quantities. Later, laboratory and personal computers provided an unlimited readout resolution and enabled programmable control of instrument parameters as well as storage and computation of acquired data. However, digital computers did not replace the op amp's critical role of converting the analog sensor's output to a robust and accurate voltage. Rather it added a new role: converting that voltage into a number. These analog operations are generally the limiting portions of our computerized instrumentation systems. Operational amplifier performance in gain, input current and resistance, offset voltage, and rise time have improved by a remarkable 3-4 orders of magnitude since their first implementations. Each 10-fold improvement has opened the doors for the development of new techniques in all areas of chemical analysis. Along with some interesting history, the multiple roles op amps play in modern instrumentation are described along with a number of examples of new areas of analysis that have been enabled by their improvements.
Unsteady Flow in a Supersonic Turbine with Variable Specific Heats
NASA Technical Reports Server (NTRS)
Dorney, Daniel J.; Griffin, Lisa W.; Huber, Frank; Sondak, Douglas L.; Turner, James (Technical Monitor)
2001-01-01
Modern high-work turbines can be compact, transonic, supersonic, counter-rotating, or use a dense drive gas. The vast majority of modern rocket turbine designs fall into these Categories. These turbines usually have large temperature variations across a given stage, and are characterized by large amounts of flow unsteadiness. The flow unsteadiness can have a major impact on the turbine performance and durability. For example, the Space Transportation Main Engine (STME) fuel turbine, a high work, transonic design, was found to have an unsteady inter-row shock which reduced efficiency by 2 points and increased dynamic loading by 24 percent. The Revolutionary Reusable Technology Turbopump (RRTT), which uses full flow oxygen for its drive gas, was found to shed vortices with such energy as to raise serious blade durability concerns. In both cases, the sources of the problems were uncovered (before turbopump testing) with the application of validated, unsteady computational fluid dynamics (CFD) to the designs. In the case of the RRTT and the Alternate Turbopump Development (ATD) turbines, the unsteady CFD codes have been used not just to identify problems, but to guide designs which mitigate problems due to unsteadiness. Using unsteady flow analyses as a part of the design process has led to turbine designs with higher performance (which affects temperature and mass flow rate) and fewer dynamics problems. One of the many assumptions made during the design and analysis of supersonic turbine stages is that the values of the specific heats are constant. In some analyses the value is based on an average of the expected upstream and downstream temperatures. In stages where the temperature can vary by 300 to 500 K, however, the assumption of constant fluid properties may lead to erroneous performance and durability predictions. In this study the suitability of assuming constant specific heats has been investigated by performing three-dimensional unsteady Navier-Stokes simulations for a supersonic turbine stage.