Identifying failure in a tree network of a parallel computer
Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.
2010-08-24
Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
Locating hardware faults in a data communications network of a parallel computer
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-01-12
Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
Link failure detection in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.
2010-11-09
Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.
Locating hardware faults in a parallel computer
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-04-13
Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
Fault tolerant hypercube computer system architecture
NASA Technical Reports Server (NTRS)
Madan, Herb S. (Inventor); Chow, Edward (Inventor)
1989-01-01
A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node operably connected to the first multiplexer whereby the second watch dog node can selectively communicate with individual ones of the computing nodes through the second and fourth networks. The branch is completed by a first load balancing node; and a second multiplexer connected between the first load balancing node and the first and second watch dog nodes, allowing the first load balancing node to selectively communicate with the first and second watch dog nodes.
Sequoia Messaging Rate Benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedley, Andrew
2008-01-22
The purpose of this benchmark is to measure the maximal message rate of a single compute node. The first num_cores ranks are expected to reside on the 'core' compute node for which message rate is being tested. After that, the next num_nbors ranks are neighbors for the first core rank, the next set of num_nbors ranks are neighbors for the second core rank, and so on. For example, testing an 8-core node (num_cores = 8) with 4 neighbors (num_nbors = 4) requires 8 + 8 * 4 - 40 ranks. The first 8 of those 40 ranks are expected tomore » be on the 'core' node being benchmarked, while the rest of the ranks are on separate nodes.« less
Integration and validation of a data grid software
NASA Astrophysics Data System (ADS)
Carenton-Madiec, Nicolas; Berger, Katharina; Cofino, Antonio
2014-05-01
The Earth System Grid Federation (ESGF) Peer-to-Peer (P2P) is a software infrastructure for the management, dissemination, and analysis of model output and observational data. The ESGF grid is composed with several types of nodes which have different roles. About 40 data nodes host model outputs and datasets using thredds catalogs. About 25 compute nodes offer remote visualization and analysis tools. About 15 index nodes crawl data nodes catalogs and implement faceted and federated search in a web interface. About 15 Identity providers nodes manage accounts, authentication and authorization. Here we will present an actual size test federation spread across different institutes in different countries and a python test suite that were started in December 2013. The first objective of the test suite is to provide a simple tool that helps to test and validate a single data node and its closest index, compute and identity provider peer. The next objective will be to run this test suite on every data node of the federation and therefore test and validate every single node of the whole federation. The suite already implements nosetests, requests, myproxy-logon, subprocess, selenium and fabric python libraries in order to test both web front ends, back ends and security services. The goal of this project is to improve the quality of deliverable in a small developers team context. Developers are widely spread around the world working collaboratively and without hierarchy. This kind of working organization context en-lighted the need of a federated integration test and validation process.
Error recovery to enable error-free message transfer between nodes of a computer network
Blumrich, Matthias A.; Coteus, Paul W.; Chen, Dong; Gara, Alan; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd; Steinmacher-Burow, Burkhard; Vranas, Pavlos M.
2016-01-26
An error-recovery method to enable error-free message transfer between nodes of a computer network. A first node of the network sends a packet to a second node of the network over a link between the nodes, and the first node keeps a copy of the packet on a sending end of the link until the first node receives acknowledgment from the second node that the packet was received without error. The second node tests the packet to determine if the packet is error free. If the packet is not error free, the second node sets a flag to mark the packet as corrupt. The second node returns acknowledgement to the first node specifying whether the packet was received with or without error. When the packet is received with error, the link is returned to a known state and the packet is sent again to the second node.
Foo Kune, Denis [Saint Paul, MN; Mahadevan, Karthikeyan [Mountain View, CA
2011-01-25
A recursive verification protocol to reduce the time variance due to delays in the network by putting the subject node at most one hop from the verifier node provides for an efficient manner to test wireless sensor nodes. Since the software signatures are time based, recursive testing will give a much cleaner signal for positive verification of the software running on any one node in the sensor network. In this protocol, the main verifier checks its neighbor, who in turn checks its neighbor, and continuing this process until all nodes have been verified. This ensures minimum time delays for the software verification. Should a node fail the test, the software verification downstream is halted until an alternative path (one not including the failed node) is found. Utilizing techniques well known in the art, having a node tested twice, or not at all, can be avoided.
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregatingmore » each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.« less
Distributing an executable job load file to compute nodes in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gooding, Thomas M.
Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gooding, Thomas M.
Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
2004-02-03
KENNEDY SPACE CENTER, FLA. - Astronaut Tim Kopra aids in Intravehicular Activity (IVA) constraints testing on the Italian-built Node 2, a future element of the International Space Station. The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E
2012-10-16
Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions.
Synchronizing compute node time bases in a parallel computer
Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip
2015-01-27
Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Synchronizing compute node time bases in a parallel computer
Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip
2014-12-30
Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Archer, Charles J; Faraj, Ahmad A; Inglett, Todd A; Ratterman, Joseph D
2013-04-16
Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selected link to the adjacent compute node connected to the compute node through the selected link.
Distributed solar radiation fast dynamic measurement for PV cells
NASA Astrophysics Data System (ADS)
Wan, Xuefen; Yang, Yi; Cui, Jian; Du, Xingjing; Zheng, Tao; Sardar, Muhammad Sohail
2017-10-01
To study the operating characteristics about PV cells, attention must be given to the dynamic behavior of the solar radiation. The dynamic behaviors of annual, monthly, daily and hourly averages of solar radiation have been studied in detail. But faster dynamic behaviors of solar radiation need more researches. The solar radiation random fluctuations in minute-long or second-long range, which lead to alternating radiation and cool down/warm up PV cell frequently, decrease conversion efficiency. Fast dynamic processes of solar radiation are mainly relevant to stochastic moving of clouds. Even in clear sky condition, the solar irradiations show a certain degree of fast variation. To evaluate operating characteristics of PV cells under fast dynamic irradiation, a solar radiation measuring array (SRMA) based on large active area photodiode, LoRa spread spectrum communication and nanoWatt MCU is proposed. This cross photodiodes structure tracks fast stochastic moving of clouds. To compensate response time of pyranometer and reduce system cost, the terminal nodes with low-cost fast-responded large active area photodiode are placed besides positions of tested PV cells. A central node, consists with pyranometer, large active area photodiode, wind detector and host computer, is placed in the center of the central topologies coordinate to scale temporal envelope of solar irradiation and get calibration information between pyranometer and large active area photodiodes. In our SRMA system, the terminal nodes are designed based on Microchip's nanoWatt XLP PIC16F1947. FDS-100 is adopted for large active area photodiode in terminal nodes and host computer. The output current and voltage of each PV cell are monitored by I/V measurement. AS62-T27/SX1278 LoRa communication modules are used for communicating between terminal nodes and host computer. Because the LoRa LPWAN (Low Power Wide Area Network) specification provides seamless interoperability among Smart Things without the need of complex local installations, configuring of our SRMA system is very easy. Lora also provides SRMA a means to overcome the short communication distance and weather signal propagation decline such as in ZigBee and WiFi. The host computer in SRMA system uses the low power single-board PC EMB-3870 which was produced by NORCO. Wind direction sensor SM5386B and wind-force sensor SM5387B are installed to host computer through RS-485 bus for wind reference data collection. And Davis 6450 solar radiation sensor, which is a precision instrument that detects radiation at wavelengths of 300 to 1100 nanometers, allow host computer to follow real-time solar radiation. A LoRa polling scheme is adopt for the communication between host computer and terminal nodes in SRMA. An experimental SRMA has been established. This system was tested in Ganyu, Jiangshu province from May to August, 2016. In the test, the distances between the nodes and the host computer were between 100m and 1900m. At work, SRMA system showed higher reliability. Terminal nodes could follow the instructions from host computer and collect solar radiation data of distributed PV cells effectively. And the host computer managed the SRAM and achieves reference parameters well. Communications between the host computer and terminal nodes were almost unaffected by the weather. In conclusion, the testing results show that SRMA could be a capable method for fast dynamic measuring about solar radiation and related PV cell operating characteristics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J.; Faraj, Daniel A.; Inglett, Todd A.
Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selectedmore » link to the adjacent compute node connected to the compute node through the selected link.« less
DCS-Neural-Network Program for Aircraft Control and Testing
NASA Technical Reports Server (NTRS)
Jorgensen, Charles C.
2006-01-01
A computer program implements a dynamic-cell-structure (DCS) artificial neural network that can perform such tasks as learning selected aerodynamic characteristics of an airplane from wind-tunnel test data and computing real-time stability and control derivatives of the airplane for use in feedback linearized control. A DCS neural network is one of several types of neural networks that can incorporate additional nodes in order to rapidly learn increasingly complex relationships between inputs and outputs. In the DCS neural network implemented by the present program, the insertion of nodes is based on accumulated error. A competitive Hebbian learning rule (a supervised-learning rule in which connection weights are adjusted to minimize differences between actual and desired outputs for training examples) is used. A Kohonen-style learning rule (derived from a relatively simple training algorithm, implements a Delaunay triangulation layout of neurons) is used to adjust node positions during training. Neighborhood topology determines which nodes are used to estimate new values. The network learns, starting with two nodes, and adds new nodes sequentially in locations chosen to maximize reductions in global error. At any given time during learning, the error becomes homogeneously distributed over all nodes.
Performing process migration with allreduce operations
Archer, Charles Jens; Peters, Amanda; Wallenfelt, Brian Paul
2010-12-14
Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the first process. A third allreduce operation generates a third result and uses the first result from first compute node, the second result from the second compute node, and zeros from others. The first compute node replaces the first result with the third result, which is the second process.
2004-02-03
KENNEDY SPACE CENTER, FLA. - Astronaut Tim Kopra (facing camera) aids in Intravehicular Activity (IVA) constraints testing on the Italian-built Node 2, a future element of the International Space Station. The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
2004-02-03
KENNEDY SPACE CENTER, FLA. - Astronaut Tim Kopra talks to a technician (off-camera) during Intravehicular Activity (IVA) constraints testing on the Italian-built Node 2, a future element of the International Space Station. The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
Controlling data transfers from an origin compute node to a target compute node
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2011-06-21
Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication.
Performance of VPIC on Trinity
NASA Astrophysics Data System (ADS)
Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Li, H.; Nam, H. A.; Pang, X.; Rust, W. N., III; Wohlbier, J.; Yin, L.; Albright, B. J.
2016-10-01
Trinity is a new major DOE computing resource which is going through final acceptance testing at Los Alamos National Laboratory. Trinity has several new and unique architectural features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes. Additional unique features include use of on package high bandwidth memory (HBM) for the KNL nodes, the ability to configure the KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to port and optimize VPIC to Trinity and evaluate its performance. Because VPIC was recently released as Open Source, it is being used as part of acceptance testing for Trinity and is participating in the Trinity Open Science Program which has resulted in excellent collaboration activities with both Cray and Intel. Results of this work will be presented on performance of VPIC on both Haswell and KNL partitions for both single node runs and runs at scale. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-01-10
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.
2012-10-23
Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, Kristan D.; Faraj, Daniel A.
In a parallel computer, a largest logical plane from a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: identifying, by each compute node of the subcommunicator, all logical planes that include the compute node; calculating, by each compute node for each identified logical plane that includes the compute node, an area of the identified logical plane; initiating, by a root node of the subcommunicator, a gather operation; receiving, by the root node from each compute node of the subcommunicator, each node's calculated areas as contribution data to the gather operation; and identifying, bymore » the root node in dependence upon the received calculated areas, a logical plane of the subcommunicator having the greatest area.« less
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E.
2010-03-02
Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root.
Collectively loading an application in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
2004-02-03
KENNEDY SPACE CENTER, FLA. - Astronaut Tim Kopra (second from right) talks with workers in the Space Station Processing Facility about the Intravehicular Activity (IVA) constraints testing on the Italian-built Node 2, a future element of the International Space Station. . The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
Paging memory from random access memory to backing storage in a parallel computer
Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E
2013-05-21
Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
Pacing a data transfer operation between compute nodes on a parallel computer
Blocksome, Michael A [Rochester, MN
2011-09-13
Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (`DMA`) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine.
Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job
DOE Office of Scientific and Technical Information (OSTI.GOV)
Budnik, Thomas A; Knudson, Brant L; Megerian, Mark G
Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of computemore » nodes running the re-launched job.« less
Administering truncated receive functions in a parallel messaging interface
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-12-09
Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.
Broadcasting collective operation contributions throughout a parallel computer
Faraj, Ahmad [Rochester, MN
2012-02-21
Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
Multiple node remote messaging
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Ohmacht, Martin; Salapura, Valentina; Steinmacher-Burow, Burkhard; Vranas, Pavlos
2010-08-31
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almasi, Gheorghe; Blumrich, Matthias Augustin; Chen, Dong
Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for--example, checksums--to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored inmore » memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.« less
Identifying logical planes formed of compute nodes of a subcommunicator in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, Kristan D.; Faraj, Daniel
In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension,more » all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.« less
Constructing a logical, regular axis topology from an irregular topology
Faraj, Daniel A.
2014-07-22
Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Constructing a logical, regular axis topology from an irregular topology
Faraj, Daniel A.
2014-07-01
Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
Archer, Charles J [Rochester, MN; Hardwick, Camesha R [Fayetteville, NC; McCarthy, Patrick J [Rochester, MN; Wallenfelt, Brian P [Eden Prairie, MN
2009-06-23
Methods, parallel computers, and products are provided for identifying messaging completion on a parallel computer. The parallel computer includes a plurality of compute nodes, the compute nodes coupled for data communications by at least two independent data communications networks including a binary tree data communications network optimal for collective operations that organizes the nodes as a tree and a torus data communications network optimal for point to point operations that organizes the nodes as a torus. Embodiments include reading all counters at each node of the torus data communications network; calculating at each node a current node value in dependence upon the values read from the counters at each node; and determining for all nodes whether the current node value for each node is the same as a previously calculated node value for each node. If the current node is the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is complete and if the current node is not the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is currently incomplete.
2004-02-03
KENNEDY SPACE CENTER, FLA. - In the Space Station Processing Facility, workers check over the Italian-built Node 2, a future element of the International Space Station. The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
Executing a gather operation on a parallel computer
Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN
2012-03-20
Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2010-11-02
Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
Hybrid data storage system in an HPC exascale environment
Bent, John M.; Faibish, Sorin; Gupta, Uday K.; Tzelnic, Percy; Ting, Dennis P. J.
2015-08-18
A computer-executable method, system, and computer program product for managing I/O requests from a compute node in communication with a data storage system, including a first burst buffer node and a second burst buffer node, the computer-executable method, system, and computer program product comprising striping data on the first burst buffer node and the second burst buffer node, wherein a first portion of the data is communicated to the first burst buffer node and a second portion of the data is communicated to the second burst buffer node, processing the first portion of the data at the first burst buffer node, and processing the second portion of the data at the second burst buffer node.
Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN
2009-11-06
Executing a scatter operation on a parallel computer includes: configuring a send buffer on a logical root, the send buffer having positions, each position corresponding to a ranked node in an operational group of compute nodes and for storing contents scattered to that ranked node; and repeatedly for each position in the send buffer: broadcasting, by the logical root to each of the other compute nodes on a global combining network, the contents of the current position of the send buffer using a bitwise OR operation, determining, by each compute node, whether the current position in the send buffer corresponds with the rank of that compute node, if the current position corresponds with the rank, receiving the contents and storing the contents in a reception buffer of that compute node, and if the current position does not correspond with the rank, discarding the contents.
Internode data communications in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.
2013-09-03
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Internode data communications in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Profiling an application for power consumption during execution on a compute node
Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E
2013-09-17
Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
Wide-area-distributed storage system for a multimedia database
NASA Astrophysics Data System (ADS)
Ueno, Masahiro; Kinoshita, Shigechika; Kuriki, Makato; Murata, Setsuko; Iwatsu, Shigetaro
1998-12-01
We have developed a wide-area-distribution storage system for multimedia databases, which minimizes the possibility of simultaneous failure of multiple disks in the event of a major disaster. It features a RAID system, whose member disks are spatially distributed over a wide area. Each node has a device, which includes the controller of the RAID and the controller of the member disks controlled by other nodes. The devices in the node are connected to a computer, using fiber optic cables and communicate using fiber-channel technology. Any computer at a node can utilize multiple devices connected by optical fibers as a single 'virtual disk.' The advantage of this system structure is that devices and fiber optic cables are shared by the computers. In this report, we first described our proposed system, and a prototype was used for testing. We then discussed its performance; i.e., how to read and write throughputs are affected by data-access delay, the RAID level, and queuing.
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
NASA Astrophysics Data System (ADS)
Garzoglio, Gabriele
2012-12-01
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on “bare metal” nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.
Investigation of storage options for scientific computing on Grid and Cloud facilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storagemore » server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on bare metal nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.« less
Line-plane broadcasting in a data communications network of a parallel computer
Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.
2010-06-08
Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
Line-plane broadcasting in a data communications network of a parallel computer
Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.
2010-11-23
Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
Profiling an application for power consumption during execution on a plurality of compute nodes
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.
2012-08-21
Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Performing a global barrier operation in a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-12-09
Executing computing tasks on a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier.
NASA Astrophysics Data System (ADS)
Pham, Tuan D.; Watanabe, Yuzuru; Higuchi, Mitsunori; Suzuki, Hiroyuki
2017-02-01
Texture analysis of computed tomography (CT) imaging has been found useful to distinguish subtle differences, which are in- visible to human eyes, between malignant and benign tissues in cancer patients. This study implemented two complementary methods of texture analysis, known as the gray-level co-occurrence matrix (GLCM) and the experimental semivariogram (SV) with an aim to improve the predictive value of evaluating mediastinal lymph nodes in lung cancer. The GLCM was explored with the use of a rich set of its derived features, whereas the SV feature was extracted on real and synthesized CT samples of benign and malignant lymph nodes. A distinct advantage of the computer methodology presented herein is the alleviation of the need for an automated precise segmentation of the lymph nodes. Using the logistic regression model, a sensitivity of 75%, specificity of 90%, and area under curve of 0.89 were obtained in the test population. A tenfold cross-validation of 70% accuracy of classifying between benign and malignant lymph nodes was obtained using the support vector machines as a pattern classifier. These results are higher than those recently reported in literature with similar studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, Kristan D.; Faraj, Daniel A.
In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension,more » all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.« less
Message passing with a limited number of DMA byte counters
Blocksome, Michael [Rochester, MN; Chen, Dong [Croton on Hudson, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kumar, Sameer [White Plains, NY; Parker, Jeffrey J [Rochester, MN
2011-10-04
A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.
Improved nine-node shell element MITC9i with reduced distortion sensitivity
NASA Astrophysics Data System (ADS)
Wisniewski, K.; Turska, E.
2017-11-01
The 9-node quadrilateral shell element MITC9i is developed for the Reissner-Mindlin shell kinematics, the extended potential energy and Green strain. The following features of its formulation ensure an improved behavior: 1. The MITC technique is used to avoid locking, and we propose improved transformations for bending and transverse shear strains, which render that all patch tests are passed for the regular mesh, i.e. with straight element sides and middle positions of midside nodes and a central node. 2. To reduce shape distortion effects, the so-called corrected shape functions of Celia and Gray (Int J Numer Meth Eng 20:1447-1459, 1984) are extended to shells and used instead of the standard ones. In effect, all patch tests are passed additionally for shifts of the midside nodes along straight element sides and for arbitrary shifts of the central node. 3. Several extensions of the corrected shape functions are proposed to enable computations of non-flat shells. In particular, a criterion is put forward to determine the shift parameters associated with the central node for non-flat elements. Additionally, the method is presented to construct a parabolic side for a shifted midside node, which improves accuracy for symmetric curved edges. Drilling rotations are included by using the drilling Rotation Constraint equation, in a way consistent with the additive/multiplicative rotation update scheme for large rotations. We show that the corrected shape functions reduce the sensitivity of the solution to the regularization parameter γ of the penalty method for this constraint. The MITC9i shell element is subjected to a range of linear and non-linear tests to show passing the patch tests, the absence of locking, very good accuracy and insensitivity to node shifts. It favorably compares to several other tested 9-node elements.
Blocksome, Michael A [Rochester, MN
2011-12-20
Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.
Reducing power consumption while performing collective operations on a plurality of compute nodes
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2011-10-18
Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation.
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks
NASA Astrophysics Data System (ADS)
Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won
Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Fast katz and commuters : efficient estimation of social relatedness in large networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen
Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less
I/O routing in a multidimensional torus network
Chen, Dong; Eisley, Noel A.; Heidelberger, Philip
2017-02-07
A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
I/O routing in a multidimensional torus network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Dong; Eisley, Noel A.; Heidelberger, Philip
A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destinationmore » address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.« less
Nedorezov, L V
2015-01-01
A stochastic model of migrations on a lattice and with discrete time is considered. It is assumed that space is homogenous with respect to its properties and during one time step every individual (independently of local population numbers) can migrate to nearest nodes of lattice with equal probabilities. It is also assumed that population size remains constant during certain time interval of computer experiments. The following variants of estimation of encounter rate between individuals are considered: when for the fixed time moments every individual in every node of lattice interacts with all other individuals in the node; when individuals can stay in nodes independently, or can be involved in groups in two, three or four individuals. For each variant of interactions between individuals, average value (with respect to space and time) is computed for various values of population size. The samples obtained were compared with respective functions of classic models of isolated population dynamics: Verhulst model, Gompertz model, Svirezhev model, and theta-logistic model. Parameters of functions were calculated with least square method. Analyses of deviations were performed using Kolmogorov-Smirnov test, Lilliefors test, Shapiro-Wilk test, and other statistical tests. It is shown that from traditional point of view there are no correspondence between the encounter rate and functions describing effects of self-regulatory mechanisms on population dynamics. Best fitting of samples was obtained with Verhulst and theta-logistic models when using the dataset resulted from the situation when every individual in the node interacts with all other individuals.
Reducing power consumption during execution of an application on a plurality of compute nodes
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-06-05
Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: executing, by each compute node, an application, the application including power consumption directives corresponding to one or more portions of the application; identifying, by each compute node, the power consumption directives included within the application during execution of the portions of the application corresponding to those identified power consumption directives; and reducing power, by each compute node, to one or more components of that compute node according to the identified power consumption directives during execution of the portions of the application corresponding to those identified power consumption directives.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
NASA Astrophysics Data System (ADS)
Cervone, G.; Clemente-Harding, L.; Alessandrini, S.; Delle Monache, L.
2016-12-01
A methodology based on Artificial Neural Networks (ANN) and an Analog Ensemble (AnEn) is presented to generate 72-hour deterministic and probabilistic forecasts of power generated by photovoltaic (PV) power plants using input from a numerical weather prediction model and computed astronomical variables. ANN and AnEn are used individually and in combination to generate forecasts for three solar power plant located in Italy. The computational scalability of the proposed solution is tested using synthetic data simulating 4,450 PV power stations. The NCAR Yellowstone supercomputer is employed to test the parallel implementation of the proposed solution, ranging from 1 node (32 cores) to 4,450 nodes (141,140 cores). Results show that a combined AnEn + ANN solution yields best results, and that the proposed solution is well suited for massive scale computation.
Reducing power consumption during execution of an application on a plurality of compute nodes
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.
2013-09-10
Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory.
Palladino, S; Keyerleber, M A; King, R G; Burgess, K E
2016-11-01
Apocrine gland adenocarcinoma of the anal sac (AGAAS) is associated with high rates of iliosacral lymph node metastasis, which may influence treatment and prognosis. Magnetic resonance imaging (MRI) recently has been shown to be more sensitive than abdominal ultrasound examination (AUS) in affected patients. To compare the rate of detection of iliosacral lymphadenomegaly between AUS and computed tomography (CT) in dogs with AGAAS. Cohort A: A total of 30 presumed normal dogs. Cohort B: A total of 20 dogs with AGAAS that underwent AUS and CT. Using cohort A, mean normalized lymph node : aorta (LN : AO) ratios were established for medial iliac, internal iliac, and sacral lymph nodes. The CT images in cohort B then were reviewed retrospectively and considered enlarged if their LN : AO ratio measured 2 standard deviations above the mean normalized ratio for that particular node in cohort A. Classification and visibility of lymph nodes identified on AUS were compared to corresponding measurements obtained on CT. Computed tomography identified lymphadenomegaly in 13 of 20 AGAAS dogs. Of these 13 dogs, AUS correctly identified and detected all enlarged nodes in only 30.8%, and either misidentified or failed to detect additional enlarged nodes in the remaining dogs. Despite limitations in identifying enlargement in all affected lymph nodes, AUS identified at least 1 enlarged node in 100% of affected dogs. Abdominal ultrasound examination is an effective screening test for lymphadenomegaly in dogs with AGAAS, but CT should be considered in any patient in which an additional metastatic site would impact therapeutic planning. Copyright © 2016 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of the American College of Veterinary Internal Medicine.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad
2013-07-09
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad
2013-02-12
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
2015-11-24
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joinedmore » the single local barrier.« less
Solving a Hamiltonian Path Problem with a bacterial computer
Baumgardner, Jordan; Acker, Karen; Adefuye, Oyinade; Crowley, Samuel Thomas; DeLoache, Will; Dickson, James O; Heard, Lane; Martens, Andrew T; Morton, Nickolaus; Ritter, Michelle; Shoecraft, Amber; Treece, Jessica; Unzicker, Matthew; Valencia, Amanda; Waters, Mike; Campbell, A Malcolm; Heyer, Laurie J; Poet, Jeffrey L; Eckdahl, Todd T
2009-01-01
Background The Hamiltonian Path Problem asks whether there is a route in a directed graph from a beginning node to an ending node, visiting each node exactly once. The Hamiltonian Path Problem is NP complete, achieving surprising computational complexity with modest increases in size. This challenge has inspired researchers to broaden the definition of a computer. DNA computers have been developed that solve NP complete problems. Bacterial computers can be programmed by constructing genetic circuits to execute an algorithm that is responsive to the environment and whose result can be observed. Each bacterium can examine a solution to a mathematical problem and billions of them can explore billions of possible solutions. Bacterial computers can be automated, made responsive to selection, and reproduce themselves so that more processing capacity is applied to problems over time. Results We programmed bacteria with a genetic circuit that enables them to evaluate all possible paths in a directed graph in order to find a Hamiltonian path. We encoded a three node directed graph as DNA segments that were autonomously shuffled randomly inside bacteria by a Hin/hixC recombination system we previously adapted from Salmonella typhimurium for use in Escherichia coli. We represented nodes in the graph as linked halves of two different genes encoding red or green fluorescent proteins. Bacterial populations displayed phenotypes that reflected random ordering of edges in the graph. Individual bacterial clones that found a Hamiltonian path reported their success by fluorescing both red and green, resulting in yellow colonies. We used DNA sequencing to verify that the yellow phenotype resulted from genotypes that represented Hamiltonian path solutions, demonstrating that our bacterial computer functioned as expected. Conclusion We successfully designed, constructed, and tested a bacterial computer capable of finding a Hamiltonian path in a three node directed graph. This proof-of-concept experiment demonstrates that bacterial computing is a new way to address NP-complete problems using the inherent advantages of genetic systems. The results of our experiments also validate synthetic biology as a valuable approach to biological engineering. We designed and constructed basic parts, devices, and systems using synthetic biology principles of standardization and abstraction. PMID:19630940
Thread selection according to power characteristics during context switching on compute nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J.; Blocksome, Michael A.; Randles, Amanda E.
Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
None, None
Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less
Community detection using preference networks
NASA Astrophysics Data System (ADS)
Tasgin, Mursel; Bingol, Haluk O.
2018-04-01
Community detection is the task of identifying clusters or groups of nodes in a network where nodes within the same group are more connected with each other than with nodes in different groups. It has practical uses in identifying similar functions or roles of nodes in many biological, social and computer networks. With the availability of very large networks in recent years, performance and scalability of community detection algorithms become crucial, i.e. if time complexity of an algorithm is high, it cannot run on large networks. In this paper, we propose a new community detection algorithm, which has a local approach and is able to run on large networks. It has a simple and effective method; given a network, algorithm constructs a preference network of nodes where each node has a single outgoing edge showing its preferred node to be in the same community with. In such a preference network, each connected component is a community. Selection of the preferred node is performed using similarity based metrics of nodes. We use two alternatives for this purpose which can be calculated in 1-neighborhood of nodes, i.e. number of common neighbors of selector node and its neighbors and, the spread capability of neighbors around the selector node which is calculated by the gossip algorithm of Lind et.al. Our algorithm is tested on both computer generated LFR networks and real-life networks with ground-truth community structure. It can identify communities accurately in a fast way. It is local, scalable and suitable for distributed execution on large networks.
Gooding, Thomas Michael; McCarthy, Patrick Joseph
2010-03-02
A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
GATE Monte Carlo simulation of dose distribution using MapReduce in a cloud computing environment.
Liu, Yangchuan; Tang, Yuguo; Gao, Xin
2017-12-01
The GATE Monte Carlo simulation platform has good application prospects of treatment planning and quality assurance. However, accurate dose calculation using GATE is time consuming. The purpose of this study is to implement a novel cloud computing method for accurate GATE Monte Carlo simulation of dose distribution using MapReduce. An Amazon Machine Image installed with Hadoop and GATE is created to set up Hadoop clusters on Amazon Elastic Compute Cloud (EC2). Macros, the input files for GATE, are split into a number of self-contained sub-macros. Through Hadoop Streaming, the sub-macros are executed by GATE in Map tasks and the sub-results are aggregated into final outputs in Reduce tasks. As an evaluation, GATE simulations were performed in a cubical water phantom for X-ray photons of 6 and 18 MeV. The parallel simulation on the cloud computing platform is as accurate as the single-threaded simulation on a local server and the simulation correctness is not affected by the failure of some worker nodes. The cloud-based simulation time is approximately inversely proportional to the number of worker nodes. For the simulation of 10 million photons on a cluster with 64 worker nodes, time decreases of 41× and 32× were achieved compared to the single worker node case and the single-threaded case, respectively. The test of Hadoop's fault tolerance showed that the simulation correctness was not affected by the failure of some worker nodes. The results verify that the proposed method provides a feasible cloud computing solution for GATE.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Understanding the I/O Performance Gap Between Cori KNL and Haswell
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jialin; Koziol, Quincey; Tang, Houjun
2017-05-01
The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions,more » and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.« less
Simulation of Detecting Damage in Composite Stiffened Panel Using Lamb Waves
NASA Technical Reports Server (NTRS)
Wang, John T.; Ross, Richard W.; Huang, Guo L.; Yuan, Fuh G.
2013-01-01
Lamb wave damage detection in a composite stiffened panel is simulated by performing explicit transient dynamic finite element analyses and using signal imaging techniques. This virtual test process does not need to use real structures, actuators/sensors, or laboratory equipment. Quasi-isotropic laminates are used for the stiffened panels. Two types of damage are studied. One type is a damage in the skin bay and the other type is a debond between the stiffener flange and the skin. Innovative approaches for identifying the damage location and imaging the damage were developed. The damage location is identified by finding the intersection of the damage locus and the path of the time reversal wave packet re-emitted from the sensor nodes. The damage locus is a circle that envelops the potential damage locations. Its center is at the actuator location and its radius is computed by multiplying the group velocity by the time of flight to damage. To create a damage image for estimating the size of damage, a group of nodes in the neighborhood of the damage location is identified for applying an image condition. The image condition, computed at a finite element node, is the zero-lag cross-correlation (ZLCC) of the time-reversed incident wave signal and the time reversal wave signal from the sensor nodes. This damage imaging process is computationally efficient since only the ZLCC values of a small amount of nodes in the neighborhood of the identified damage location are computed instead of those of the full model.
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Jiang, Joe-Air; Chuang, Cheng-Long; Lin, Tzu-Shiang; Chen, Chia-Pang; Hung, Chih-Hung; Wang, Jiing-Yi; Liu, Chang-Wang; Lai, Tzu-Yun
2010-01-01
In recent years, various received signal strength (RSS)-based localization estimation approaches for wireless sensor networks (WSNs) have been proposed. RSS-based localization is regarded as a low-cost solution for many location-aware applications in WSNs. In previous studies, the radiation patterns of all sensor nodes are assumed to be spherical, which is an oversimplification of the radio propagation model in practical applications. In this study, we present an RSS-based cooperative localization method that estimates unknown coordinates of sensor nodes in a network. Arrangement of two external low-cost omnidirectional dipole antennas is developed by using the distance-power gradient model. A modified robust regression is also proposed to determine the relative azimuth and distance between a sensor node and a fixed reference node. In addition, a cooperative localization scheme that incorporates estimations from multiple fixed reference nodes is presented to improve the accuracy of the localization. The proposed method is tested via computer-based analysis and field test. Experimental results demonstrate that the proposed low-cost method is a useful solution for localizing sensor nodes in unknown or changing environments.
Performance of an anisotropic Allman/DKT 3-node thin triangular flat shell element
NASA Astrophysics Data System (ADS)
Ertas, A.; Krafcik, J. T.; Ekwaro-Osire, S.
1992-05-01
A simple, explicit formulation of the stiffness matrix for an anisotropic, 3-node, thin triangular flat shell element in global coordinates is presented. An Allman triangle (AT) is used for membrane stiffness. The membrane stiffness matrix is explicitly derived by applying an Allman transformation to a Felippa 6-node linear strain triangle (LST). Bending stiffness is incorporated by the use of a discrete Kirchhoff triangle (DKT) bending element. Stiffness terms resulting from anisotropic membrane-bending coupling are included by integrating, in area coordinates, the membrane and bending strain-displacement matrices. Using the aforementioned approach, the objective of this study is to develop and test the performance of a practical 3-node flat shell element that could be used in plate problems with unsymmetrically stacked composite laminates. The performance of the latter element is tested on plates of varying aspect ratios. The developed 3-node shell element should simplify the programming task and have the potential of reducing the computational time.
Parallel reduced-instruction-set-computer architecture for real-time symbolic pattern matching
NASA Astrophysics Data System (ADS)
Parson, Dale E.
1991-03-01
This report discusses ongoing work on a parallel reduced-instruction- set-computer (RISC) architecture for automatic production matching. The PRIOPS compiler takes advantage of the memoryless character of automatic processing by translating a program's collection of automatic production tests into an equivalent combinational circuit-a digital circuit without memory, whose outputs are immediate functions of its inputs. The circuit provides a highly parallel, fine-grain model of automatic matching. The compiler then maps the combinational circuit onto RISC hardware. The heart of the processor is an array of comparators capable of testing production conditions in parallel, Each comparator attaches to private memory that contains virtual circuit nodes-records of the current state of nodes and busses in the combinational circuit. All comparator memories hold identical information, allowing simultaneous update for a single changing circuit node and simultaneous retrieval of different circuit nodes by different comparators. Along with the comparator-based logic unit is a sequencer that determines the current combination of production-derived comparisons to try, based on the combined success and failure of previous combinations of comparisons. The memoryless nature of automatic matching allows the compiler to designate invariant memory addresses for virtual circuit nodes, and to generate the most effective sequences of comparison test combinations. The result is maximal utilization of parallel hardware, indicating speed increases and scalability beyond that found for course-grain, multiprocessor approaches to concurrent Rete matching. Future work will consider application of this RISC architecture to the standard (controlled) Rete algorithm, where search through memory dominates portions of matching.
Budget-based power consumption for application execution on a plurality of compute nodes
Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E
2013-02-05
Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications.
Budget-based power consumption for application execution on a plurality of compute nodes
Archer, Charles J; Inglett, Todd A; Ratterman, Joseph D
2012-10-23
Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications.
Overlapping Community Detection based on Network Decomposition
NASA Astrophysics Data System (ADS)
Ding, Zhuanlian; Zhang, Xingyi; Sun, Dengdi; Luo, Bin
2016-04-01
Community detection in complex network has become a vital step to understand the structure and dynamics of networks in various fields. However, traditional node clustering and relatively new proposed link clustering methods have inherent drawbacks to discover overlapping communities. Node clustering is inadequate to capture the pervasive overlaps, while link clustering is often criticized due to the high computational cost and ambiguous definition of communities. So, overlapping community detection is still a formidable challenge. In this work, we propose a new overlapping community detection algorithm based on network decomposition, called NDOCD. Specifically, NDOCD iteratively splits the network by removing all links in derived link communities, which are identified by utilizing node clustering technique. The network decomposition contributes to reducing the computation time and noise link elimination conduces to improving the quality of obtained communities. Besides, we employ node clustering technique rather than link similarity measure to discover link communities, thus NDOCD avoids an ambiguous definition of community and becomes less time-consuming. We test our approach on both synthetic and real-world networks. Results demonstrate the superior performance of our approach both in computation time and accuracy compared to state-of-the-art algorithms.
DMA engine for repeating communication patterns
Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard; Vranas, Pavlos
2010-09-21
A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.
NASA Technical Reports Server (NTRS)
2004-01-01
KENNEDY SPACE CENTER, FLA. Astronaut Tim Kopra (second from right) talks with workers in the Space Station Processing Facility about the Intravehicular Activity (IVA) constraints testing on the Italian-built Node 2, a future element of the International Space Station. . The second of three Station connecting modules, the Node 2 attaches to the end of the U.S. Lab and provides attach locations for several other elements. Kopra is currently assigned technical duties in the Space Station Branch of the Astronaut Office, where his primary focus involves the testing of crew interfaces for two future ISS modules as well as the implementation of support computers and operational Local Area Network on ISS. Node 2 is scheduled to launch on mission STS-120, Station assembly flight 10A.
Chaining direct memory access data transfer operations for compute nodes in a parallel computer
Archer, Charles J.; Blocksome, Michael A.
2010-09-28
Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.
Parallel compression of data chunks of a shared data object using a log-structured file system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin; Grider, Gary
2016-10-25
Techniques are provided for parallel compression of data chunks being written to a shared object. A client executing on a compute node or a burst buffer node in a parallel computing system stores a data chunk generated by the parallel computing system to a shared data object on a storage node by compressing the data chunk; and providing the data compressed data chunk to the storage node that stores the shared object. The client and storage node may employ Log-Structured File techniques. The compressed data chunk can be de-compressed by the client when the data chunk is read. A storagemore » node stores a data chunk as part of a shared object by receiving a compressed version of the data chunk from a compute node; and storing the compressed version of the data chunk to the shared data object on the storage node.« less
GPU-accelerated Modeling and Element-free Reverse-time Migration with Gauss Points Partition
NASA Astrophysics Data System (ADS)
Zhen, Z.; Jia, X.
2014-12-01
Element-free method (EFM) has been applied to seismic modeling and migration. Compared with finite element method (FEM) and finite difference method (FDM), it is much cheaper and more flexible because only the information of the nodes and the boundary of the study area are required in computation. In the EFM, the number of Gauss points should be consistent with the number of model nodes; otherwise the accuracy of the intermediate coefficient matrices would be harmed. Thus when we increase the nodes of velocity model in order to obtain higher resolution, we find that the size of the computer's memory will be a bottleneck. The original EFM can deal with at most 81×81 nodes in the case of 2G memory, as tested by Jia and Hu (2006). In order to solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition (GPP), and utilize the GPUs to improve the computation efficiency. Considering the characteristics of the Gaussian points, the GPP method doesn't influence the propagation of seismic wave in the velocity model. To overcome the time-consuming computation of the stiffness matrix (K) and the mass matrix (M), we also use the GPUs in our computation program. We employ the compressed sparse row (CSR) format to compress the intermediate sparse matrices and try to simplify the operations by solving the linear equations with the CULA Sparse's Conjugate Gradient (CG) solver instead of the linear sparse solver 'PARDISO'. It is observed that our strategy can significantly reduce the computational time of K and Mcompared with the algorithm based on CPU. The model tested is Marmousi model. The length of the model is 7425m and the depth is 2990m. We discretize the model with 595x298 nodes, 300x300 Gauss cells and 3x3 Gauss points in each cell. In contrast to the computational time of the conventional EFM, the GPUs-GPP approach can substantially improve the efficiency. The speedup ratio of time consumption of computing K, M is 120 and the speedup ratio time consumption of RTM is 11.5. At the same time, the accuracy of imaging is not harmed. Another advantage of the GPUs-GPP method is its easy applications in other numerical methods such as the FEM. Finally, in the GPUs-GPP method, the arrays require quite limited memory storage, which makes the method promising in dealing with large-scale 3D problems.
Numerical Modeling of Flow Distribution in Micro-Fluidics Systems
NASA Technical Reports Server (NTRS)
Majumdar, Alok; Cole, Helen; Chen, C. P.
2005-01-01
This paper describes an application of a general purpose computer program, GFSSP (Generalized Fluid System Simulation Program) for calculating flow distribution in a network of micro-channels. GFSSP employs a finite volume formulation of mass and momentum conservation equations in a network consisting of nodes and branches. Mass conservation equation is solved for pressures at the nodes while the momentum conservation equation is solved at the branches to calculate flowrate. The system of equations describing the fluid network is solved by a numerical method that is a combination of the Newton-Raphson and successive substitution methods. The numerical results have been compared with test data and detailed CFD (computational Fluid Dynamics) calculations. The agreement between test data and predictions is satisfactory. The discrepancies between the predictions and test data can be attributed to the frictional correlation which does not include the effect of surface tension or electro-kinetic effect.
Methods for operating parallel computing systems employing sequenced communications
Benner, R.E.; Gustafson, J.L.; Montry, G.R.
1999-08-10
A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications
Benner, Robert E.; Gustafson, John L.; Montry, Gary R.
1999-01-01
A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Design & implementation of distributed spatial computing node based on WPS
NASA Astrophysics Data System (ADS)
Liu, Liping; Li, Guoqing; Xie, Jibo
2014-03-01
Currently, the research work of SIG (Spatial Information Grid) technology mostly emphasizes on the spatial data sharing in grid environment, while the importance of spatial computing resources is ignored. In order to implement the sharing and cooperation of spatial computing resources in grid environment, this paper does a systematical research of the key technologies to construct Spatial Computing Node based on the WPS (Web Processing Service) specification by OGC (Open Geospatial Consortium). And a framework of Spatial Computing Node is designed according to the features of spatial computing resources. Finally, a prototype of Spatial Computing Node is implemented and the relevant verification work under the environment is completed.
Method and apparatus for offloading compute resources to a flash co-processing appliance
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung
2015-10-13
Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path.
Xie, Zhiqiang; Shao, Xia; Xin, Yu
2016-01-01
To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path
Xie, Zhiqiang; Shao, Xia; Xin, Yu
2016-01-01
To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective. PMID:27490901
Marchand, Clare
2017-01-01
Background Endobronchial ultrasound‐guided transbronchial needle aspiration (EBUS‐TBNA) diagnoses and stages mediastinal lymph node pathology. This retrospective study determined the relationship between EBUS‐TBNA utility and non‐small cell lung cancer (NSCLC) stage, lymph node size, and positron emission tomography (PET) standard uptake values (SUV), and the utility of neck ultrasound in bulky mediastinal disease. Methods Data of 284 consecutive patients who had undergone EBUS‐TBNA was collected. Two hundred patients had suspected NSCLC, with 148 confirmed NSCLC cases. The diagnostic utility of EBUS‐TBNA was determined according to NSCLC stage, EBUS lymph node size, PET SUV, use in distal metastases, and mutation testing. The utility of neck ultrasound for N3 disease was calculated in patients with bulky mediastinal disease. Results EBUS‐TBNA was well tolerated with 97% sensitivity in distant metastatic disease, avoiding the need for distal metastases biopsy in 81% of cases. It had equivalent diagnostic accuracy in all NSCLC stages and in lymph nodes <10 mm, <20 mm or >20 mm (sensitivity >92% in all cases), with no mutation testing failures. EBUS‐TBNA had 33% sensitivity in PET indolent (SUV < 4) nodes and 79% sensitivity in PET active nodes (SUV > 4). EBUS‐TBNA diagnosed 12 cases of lymphoma without flow cytometry. Conclusions The use of EBUS‐TBNA meant that distant metastatic biopsy was avoided in 81% of cases, performing well irrespective of cancer stage, node size, and facilitating mutation testing. Neck ultrasound failed to detect N3 disease in patients with bulky mediastinal disease. EBUS‐TBNA had a sensitivity of 33% for metastases in PET negative nodes, highlighting PET limitations. PMID:28436173
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Sameer
Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the rootmore » processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become "virtual root compute nodes" for the purpose of replicating the broadcast message to lower levels of a tree.« less
Self-pacing direct memory access data transfer operations for compute nodes in a parallel computer
Blocksome, Michael A
2015-02-17
Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.
Improved Extreme Learning Machine based on the Sensitivity Analysis
NASA Astrophysics Data System (ADS)
Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang
2018-03-01
Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Development of a Computing Cluster At the University of Richmond
NASA Astrophysics Data System (ADS)
Carbonneau, J.; Gilfoyle, G. P.; Bunn, E. F.
2010-11-01
The University of Richmond has developed a computing cluster to support the massive simulation and data analysis requirements for programs in intermediate-energy nuclear physics, and cosmology. It is a 20-node, 240-core system running Red Hat Enterprise Linux 5. We have built and installed the physics software packages (Geant4, gemc, MADmap...) and developed shell and Perl scripts for running those programs on the remote nodes. The system has a theoretical processing peak of about 2500 GFLOPS. Testing with the High Performance Linpack (HPL) benchmarking program (one of the standard benchmarks used by the TOP500 list of fastest supercomputers) resulted in speeds of over 900 GFLOPS. The difference between the maximum and measured speeds is due to limitations in the communication speed among the nodes; creating a bottleneck for large memory problems. As HPL sends data between nodes, the gigabit Ethernet connection cannot keep up with the processing power. We will show how both the theoretical and actual performance of the cluster compares with other current and past clusters, as well as the cost per GFLOP. We will also examine the scaling of the performance when distributed to increasing numbers of nodes.
Computer hardware fault administration
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-09-14
Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.
Computational lymphatic node models in pediatric and adult hybrid phantoms for radiation dosimetry
NASA Astrophysics Data System (ADS)
Lee, Choonsik; Lamart, Stephanie; Moroz, Brian E.
2013-03-01
We developed models of lymphatic nodes for six pediatric and two adult hybrid computational phantoms to calculate the lymphatic node dose estimates from external and internal radiation exposures. We derived the number of lymphatic nodes from the recommendations in International Commission on Radiological Protection (ICRP) Publications 23 and 89 at 16 cluster locations for the lymphatic nodes: extrathoracic, cervical, thoracic (upper and lower), breast (left and right), mesentery (left and right), axillary (left and right), cubital (left and right), inguinal (left and right) and popliteal (left and right), for different ages (newborn, 1-, 5-, 10-, 15-year-old and adult). We modeled each lymphatic node within the voxel format of the hybrid phantoms by assuming that all nodes have identical size derived from published data except narrow cluster sites. The lymph nodes were generated by the following algorithm: (1) selection of the lymph node site among the 16 cluster sites; (2) random sampling of the location of the lymph node within a spherical space centered at the chosen cluster site; (3) creation of the sphere or ovoid of tissue representing the node based on lymphatic node characteristics defined in ICRP Publications 23 and 89. We created lymph nodes until the pre-defined number of lymphatic nodes at the selected cluster site was reached. This algorithm was applied to pediatric (newborn, 1-, 5-and 10-year-old male, and 15-year-old males) and adult male and female ICRP-compliant hybrid phantoms after voxelization. To assess the performance of our models for internal dosimetry, we calculated dose conversion coefficients, called S values, for selected organs and tissues with Iodine-131 distributed in six lymphatic node cluster sites using MCNPX2.6, a well validated Monte Carlo radiation transport code. Our analysis of the calculations indicates that the S values were significantly affected by the location of the lymph node clusters and that the values increased for smaller phantoms due to the shorter inter-organ distances compared to the bigger phantoms. By testing sensitivity of S values to random sampling and voxel resolution, we confirmed that the lymph node model is reasonably stable and consistent for different random samplings and voxel resolutions.
A unified design space of synthetic stripe-forming networks
Schaerli, Yolanda; Munteanu, Andreea; Gili, Magüi; Cotterell, James; Sharpe, James; Isalan, Mark
2014-01-01
Synthetic biology is a promising tool to study the function and properties of gene regulatory networks. Gene circuits with predefined behaviours have been successfully built and modelled, but largely on a case-by-case basis. Here we go beyond individual networks and explore both computationally and synthetically the design space of possible dynamical mechanisms for 3-node stripe-forming networks. First, we computationally test every possible 3-node network for stripe formation in a morphogen gradient. We discover four different dynamical mechanisms to form a stripe and identify the minimal network of each group. Next, with the help of newly established engineering criteria we build these four networks synthetically and show that they indeed operate with four fundamentally distinct mechanisms. Finally, this close match between theory and experiment allows us to infer and subsequently build a 2-node network that represents the archetype of the explored design space. PMID:25247316
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, J
Purpose: This study evaluated the efficiency of 4D lung radiation treatment planning using Monte Carlo simulation on the cloud. The EGSnrc Monte Carlo code was used in dose calculation on the 4D-CT image set. Methods: 4D lung radiation treatment plan was created by the DOSCTP linked to the cloud, based on the Amazon elastic compute cloud platform. Dose calculation was carried out by Monte Carlo simulation on the 4D-CT image set on the cloud, and results were sent to the FFD4D image deformation program for dose reconstruction. The dependence of computing time for treatment plan on the number of computemore » node was optimized with variations of the number of CT image set in the breathing cycle and dose reconstruction time of the FFD4D. Results: It is found that the dependence of computing time on the number of compute node was affected by the diminishing return of the number of node used in Monte Carlo simulation. Moreover, the performance of the 4D treatment planning could be optimized by using smaller than 10 compute nodes on the cloud. The effects of the number of image set and dose reconstruction time on the dependence of computing time on the number of node were not significant, as more than 15 compute nodes were used in Monte Carlo simulations. Conclusion: The issue of long computing time in 4D treatment plan, requiring Monte Carlo dose calculations in all CT image sets in the breathing cycle, can be solved using the cloud computing technology. It is concluded that the optimized number of compute node selected in simulation should be between 5 and 15, as the dependence of computing time on the number of node is significant.« less
ALICE HLT Cluster operation during ALICE Run 2
NASA Astrophysics Data System (ADS)
Lehrbach, J.; Krzewicki, M.; Rohr, D.; Engel, H.; Gomez Ramirez, A.; Lindenstruth, V.; Berzano, D.;
2017-10-01
ALICE (A Large Ion Collider Experiment) is one of the four major detectors located at the LHC at CERN, focusing on the study of heavy-ion collisions. The ALICE High Level Trigger (HLT) is a compute cluster which reconstructs the events and compresses the data in real-time. The data compression by the HLT is a vital part of data taking especially during the heavy-ion runs in order to be able to store the data which implies that reliability of the whole cluster is an important matter. To guarantee a consistent state among all compute nodes of the HLT cluster we have automatized the operation as much as possible. For automatic deployment of the nodes we use Foreman with locally mirrored repositories and for configuration management of the nodes we use Puppet. Important parameters like temperatures, network traffic, CPU load etc. of the nodes are monitored with Zabbix. During periods without beam the HLT cluster is used for tests and as one of the WLCG Grid sites to compute offline jobs in order to maximize the usage of our cluster. To prevent interference with normal HLT operations we separate the virtual machines running the Grid jobs from the normal HLT operation via virtual networks (VLANs). In this paper we give an overview of the ALICE HLT operation in 2016.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.
Broadcasting a message in a parallel computer
Archer, Charles J; Faraj, Ahmad A
2013-04-16
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the child nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.
Broadcasting a message in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the childmore » nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.« less
Cloud computing method for dynamically scaling a process across physical machine boundaries
Gillen, Robert E.; Patton, Robert M.; Potok, Thomas E.; Rojas, Carlos C.
2014-09-02
A cloud computing platform includes first device having a graph or tree structure with a node which receives data. The data is processed by the node or communicated to a child node for processing. A first node in the graph or tree structure determines the reconfiguration of a portion of the graph or tree structure on a second device. The reconfiguration may include moving a second node and some or all of its descendant nodes. The second and descendant nodes may be copied to the second device.
Method for simultaneous overlapped communications between neighboring processors in a multiple
Benner, Robert E.; Gustafson, John L.; Montry, Gary R.
1991-01-01
A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Fault tolerant features and experiments of ANTS distributed real-time system
NASA Astrophysics Data System (ADS)
Dominic-Savio, Patrick; Lo, Jien-Chung; Tufts, Donald W.
1995-01-01
The ANTS project at the University of Rhode Island introduces the concept of Active Nodal Task Seeking (ANTS) as a way to efficiently design and implement dependable, high-performance, distributed computing. This paper presents the fault tolerant design features that have been incorporated in the ANTS experimental system implementation. The results of performance evaluations and fault injection experiments are reported. The fault-tolerant version of ANTS categorizes all computing nodes into three groups. They are: the up-and-running green group, the self-diagnosing yellow group and the failed red group. Each available computing node will be placed in the yellow group periodically for a routine diagnosis. In addition, for long-life missions, ANTS uses a monitoring scheme to identify faulty computing nodes. In this monitoring scheme, the communication pattern of each computing node is monitored by two other nodes.
2006-02-01
wireless sensor device network, and a about 200 Stargate nodes higher-tier multi-hop peer- to-peer 802.11b wireless network. Leading up to the full ExScal...deployment, we conducted spatial scaling tests on our higher-tier protocols on a 7 × 7 grid of Stargates nodes 45m and with 90m separations respectively...onW and its scaled version W̃ . III. EXPERIMENTAL SETUP Description of Kansei testbed. A stargate is a single board linux-based computer [7]. It uses a
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2014-04-01
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2014-04-22
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2013-07-02
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Architecture and method for a burst buffer using flash technology
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing-bung
2016-03-15
A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Checkpointing for a hybrid computing node
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cher, Chen-Yong
2016-03-08
According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.
A systematic review of tests for lymph node status in primary endometrial cancer.
Selman, Tara J; Mann, Christopher H; Zamora, Javier; Khan, Khalid S
2008-05-05
The lymph node status of a patient is a key determinate in staging, prognosis and adjuvant treatment of endometrial cancer. Despite this, the potential additional morbidity associated with lymphadenectomy makes its role controversial. This study systematically reviews the accuracy literature on sentinel node biopsy; ultra sound scanning, magnetic resonance imaging (MRI) and computer tomography (CT) for determining lymph node status in endometrial cancer. Relevant articles were identified form MEDLINE (1966-2006), EMBASE (1980-2006), MEDION, the Cochrane library, hand searching of reference lists from primary articles and reviews, conference abstracts and contact with experts in the field. The review included 18 relevant primary studies (693 women). Data was extracted for study characteristics and quality. Bivariate random-effect model meta-analysis was used to estimate diagnostic accuracy of the various index tests. MRI (pooled positive LR 26.7, 95% CI 10.6 - 67.6 and negative LR 0.29 95% CI 0.17 - 0.49) and successful sentinel node biopsy (pooled positive LR 18.9 95% CI 6.7 - 53.2 and negative LR 0.22, 95% CI 0.1 - 0.48) were the most accurate tests. CT was not as accurate a test (pooled positive LR 3.8, 95% CI 2.0 - 7.3 and negative LR of 0.62, 95% CI 0.45 - 0.86. There was only one study that reported the use of ultrasound scanning. MRI and sentinel node biopsy have shown similar diagnostic accuracy in confirming lymph node status among women with primary endometrial cancer than CT scanning, although the comparisons made are indirect and hence subject to bias. MRI should be used in preference, in light of the ASTEC trial, because of its non invasive nature.
A systematic review of tests for lymph node status in primary endometrial cancer
Selman, Tara J; Mann, Christopher H; Zamora, Javier; Khan, Khalid S
2008-01-01
Background The lymph node status of a patient is a key determinate in staging, prognosis and adjuvant treatment of endometrial cancer. Despite this, the potential additional morbidity associated with lymphadenectomy makes its role controversial. This study systematically reviews the accuracy literature on sentinel node biopsy; ultra sound scanning, magnetic resonance imaging (MRI) and computer tomography (CT) for determining lymph node status in endometrial cancer. Methods Relevant articles were identified form MEDLINE (1966–2006), EMBASE (1980–2006), MEDION, the Cochrane library, hand searching of reference lists from primary articles and reviews, conference abstracts and contact with experts in the field. The review included 18 relevant primary studies (693 women). Data was extracted for study characteristics and quality. Bivariate random-effect model meta-analysis was used to estimate diagnostic accuracy of the various index tests. Results MRI (pooled positive LR 26.7, 95% CI 10.6 – 67.6 and negative LR 0.29 95% CI 0.17 – 0.49) and successful sentinel node biopsy (pooled positive LR 18.9 95% CI 6.7 – 53.2 and negative LR 0.22, 95% CI 0.1 – 0.48) were the most accurate tests. CT was not as accurate a test (pooled positive LR 3.8, 95% CI 2.0 – 7.3 and negative LR of 0.62, 95% CI 0.45 – 0.86. There was only one study that reported the use of ultrasound scanning. Conclusion MRI and sentinel node biopsy have shown similar diagnostic accuracy in confirming lymph node status among women with primary endometrial cancer than CT scanning, although the comparisons made are indirect and hence subject to bias. MRI should be used in preference, in light of the ASTEC trial, because of its non invasive nature. PMID:18457596
Managing internode data communications for an uninitialized process in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior tomore » initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.« less
Managing internode data communications for an uninitialized process in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
GAS eleven node thermal model (GEM)
NASA Technical Reports Server (NTRS)
Butler, Dan
1988-01-01
The Eleven Node Thermal Model (GEM) of the Get Away Special (GAS) container was originally developed based on the results of thermal tests of the GAS container. The model was then used in the thermal analysis and design of several NASA/GSFC GAS experiments, including the Flight Verification Payload, the Ultraviolet Experiment, and the Capillary Pumped Loop. The model description details the five cu ft container both with and without an insulated end cap. Mass specific heat values are also given so that transient analyses can be performed. A sample problem for each configuration is included as well so that GEM users can verify their computations. The model can be run on most personal computers with a thermal analyzer solution routine.
A biologically inspired immunization strategy for network epidemiology.
Liu, Yang; Deng, Yong; Jusup, Marko; Wang, Zhen
2016-07-07
Well-known immunization strategies, based on degree centrality, betweenness centrality, or closeness centrality, either neglect the structural significance of a node or require global information about the network. We propose a biologically inspired immunization strategy that circumvents both of these problems by considering the number of links of a focal node and the way the neighbors are connected among themselves. The strategy thus measures the dependence of the neighbors on the focal node, identifying the ability of this node to spread the disease. Nodes with the highest ability in the network are the first to be immunized. To test the performance of our method, we conduct numerical simulations on several computer-generated and empirical networks, using the susceptible-infected-recovered (SIR) model. The results show that the proposed strategy largely outperforms the existing well-known strategies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hyperswitch Network For Hypercube Computer
NASA Technical Reports Server (NTRS)
Chow, Edward; Madan, Herbert; Peterson, John
1989-01-01
Data-driven dynamic switching enables high speed data transfer. Proposed hyperswitch network based on mixed static and dynamic topologies. Routing header modified in response to congestion or faults encountered as path established. Static topology meets requirement if nodes have switching elements that perform necessary routing header revisions dynamically. Hypercube topology now being implemented with switching element in each computer node aimed at designing very-richly-interconnected multicomputer system. Interconnection network connects great number of small computer nodes, using fixed hypercube topology, characterized by point-to-point links between nodes.
Building a Terabyte Memory Bandwidth Compute Node with Four Consumer Electronics GPUs
NASA Astrophysics Data System (ADS)
Omlin, Samuel; Räss, Ludovic; Podladchikov, Yuri
2014-05-01
GPUs released for consumer electronics are generally built with the same chip architectures as the GPUs released for professional usage. With regards to scientific computing, there are no obvious important differences in functionality or performance between the two types of releases, yet the price can differ up to one order of magnitude. For example, the consumer electronics release of the most recent NVIDIA Kepler architecture (GK110), named GeForce GTX TITAN, performed equally well in conducted memory bandwidth tests as the professional release, named Tesla K20; the consumer electronics release costs about one third of the professional release. We explain how to design and assemble a well adjusted computer with four high-end consumer electronics GPUs (GeForce GTX TITAN) combining more than 1 terabyte/s memory bandwidth. We compare the system's performance and precision with the one of hardware released for professional usage. The system can be used as a powerful workstation for scientific computing or as a compute node in a home-built GPU cluster.
Running Batch Jobs on Peregrine | High-Performance Computing | NREL
Using Resource Feature to Request Different Node Types Peregrine has several types of compute nodes incompatibility and get the job running. More information about requesting different node types in Peregrine is available. Queues In order to meet the needs of different types of jobs, nodes on Peregrine are available
Balancing computation and communication power in power constrained clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piga, Leonardo; Paul, Indrani; Huang, Wei
Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agentmore » may be configured to reassign the unused power to the active nodes to expedite workload processing.« less
The Use of Signal Dimensionality for Automatic QC of Seismic Array Data
NASA Astrophysics Data System (ADS)
Rowe, C. A.; Stead, R. J.; Begnaud, M. L.; Draganov, D.; Maceira, M.; Gomez, M.
2014-12-01
A significant problem in seismic array analysis is the inclusion of bad sensor channels in the beam-forming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by-node basis, so the dimensionality of the node traffic is instead monitored for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. We examine the signal dimension in similar way to the method addressing node traffic anomalies in large computer systems. We explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements. We show preliminary results applied to arrays in Kazakhstan (Makanchi) and Argentina (Malargue).
Message communications of particular message types between compute nodes using DMA shadow buffers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blocksome, Michael A.; Parker, Jeffrey J.
Message communications of particular message types between compute nodes using DMA shadow buffers includes: receiving a buffer identifier specifying an application buffer having a message of a particular type for transmission to a target compute node through a network; selecting one of a plurality of shadow buffers for a DMA engine on the compute node for storing the message, each shadow buffer corresponding to a slot of an injection FIFO buffer maintained by the DMA engine; storing the message in the selected shadow buffer; creating a data descriptor for the message stored in the selected shadow buffer; injecting the datamore » descriptor into the slot of the injection FIFO buffer corresponding to the selected shadow buffer; selecting the data descriptor from the injection FIFO buffer; and transmitting the message specified by the selected data descriptor through the data communications network to the target compute node.« less
Parallel file system with metadata distributed across partitioned key-value store c
Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron
2017-09-19
Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Chen, G; Wu, F Y; Liu, Z C; Yang, K; Cui, F
2015-08-01
Subject-specific finite element (FE) models can be generated from computed tomography (CT) datasets of a bone. A key step is assigning material properties automatically onto finite element models, which remains a great challenge. This paper proposes a node-based assignment approach and also compares it with the element-based approach in the literature. Both approaches were implemented using ABAQUS. The assignment procedure is divided into two steps: generating the data file of the image intensity of a bone in a MATLAB program and reading the data file into ABAQUS via user subroutines. The node-based approach assigns the material properties to each node of the finite element mesh, while the element-based approach assigns the material properties directly to each integration point of an element. Both approaches are independent from the type of elements. A number of FE meshes are tested and both give accurate solutions; comparatively the node-based approach involves less programming effort. The node-based approach is also independent from the type of analyses; it has been tested on the nonlinear analysis of a Sawbone femur. The node-based approach substantially improves the level of automation of the assignment procedure of bone material properties. It is the simplest and most powerful approach that is applicable to many types of analyses and elements. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.
Archer, Charles J.; Blocksome, Michael A.
2012-12-11
Methods, parallel computers, and computer program products are disclosed for remote direct memory access. Embodiments include transmitting, from an origin DMA engine on an origin compute node to a plurality target DMA engines on target compute nodes, a request to send message, the request to send message specifying a data to be transferred from the origin DMA engine to data storage on each target compute node; receiving, by each target DMA engine on each target compute node, the request to send message; preparing, by each target DMA engine, to store data according to the data storage reference and the data length, including assigning a base storage address for the data storage reference; sending, by one or more of the target DMA engines, an acknowledgment message acknowledging that all the target DMA engines are prepared to receive a data transmission from the origin DMA engine; receiving, by the origin DMA engine, the acknowledgement message from the one or more of the target DMA engines; and transferring, by the origin DMA engine, data to data storage on each of the target compute nodes according to the data storage reference using a single direct put operation.
Peregrine System Configuration | High-Performance Computing | NREL
nodes and storage are connected by a high speed InfiniBand network. Compute nodes are diskless with an directories are mounted on all nodes, along with a file system dedicated to shared projects. A brief processors with 64 GB of memory. All nodes are connected to the high speed Infiniband network and and a
Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2012-01-10
The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2008-01-01
The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Send-side matching of data communications messages
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-06-17
Send-side matching of data communications messages in a distributed computing system comprising a plurality of compute nodes, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message.
Simulation of Stagnation Region Heating in Hypersonic Flow on Tetrahedral Grids
NASA Technical Reports Server (NTRS)
Gnoffo, Peter A.
2007-01-01
Hypersonic flow simulations using the node based, unstructured grid code FUN3D are presented. Applications include simple (cylinder) and complex (towed ballute) configurations. Emphasis throughout is on computation of stagnation region heating in hypersonic flow on tetrahedral grids. Hypersonic flow over a cylinder provides a simple test problem for exposing any flaws in a simulation algorithm with regard to its ability to compute accurate heating on such grids. Such flaws predominantly derive from the quality of the captured shock. The importance of pure tetrahedral formulations are discussed. Algorithm adjustments for the baseline Roe / Symmetric, Total-Variation-Diminishing (STVD) formulation to deal with simulation accuracy are presented. Formulations of surface normal gradients to compute heating and diffusion to the surface as needed for a radiative equilibrium wall boundary condition and finite catalytic wall boundary in the node-based unstructured environment are developed. A satisfactory resolution of the heating problem on tetrahedral grids is not realized here; however, a definition of a test problem, and discussion of observed algorithm behaviors to date are presented in order to promote further research on this important problem.
Dynamic resource allocation scheme for distributed heterogeneous computer systems
NASA Technical Reports Server (NTRS)
Liu, Howard T. (Inventor); Silvester, John A. (Inventor)
1991-01-01
This invention relates to a resource allocation in computer systems, and more particularly, to a method and associated apparatus for shortening response time and improving efficiency of a heterogeneous distributed networked computer system by reallocating the jobs queued up for busy nodes to idle, or less-busy nodes. In accordance with the algorithm (SIDA for short), the load-sharing is initiated by the server device in a manner such that extra overhead in not imposed on the system during heavily-loaded conditions. The algorithm employed in the present invention uses a dual-mode, server-initiated approach. Jobs are transferred from heavily burdened nodes (i.e., over a high threshold limit) to low burdened nodes at the initiation of the receiving node when: (1) a job finishes at a node which is burdened below a pre-established threshold level, or (2) a node is idle for a period of time as established by a wakeup timer at the node. The invention uses a combination of the local queue length and the local service rate ratio at each node as the workload indicator.
Early results from the Array of Things
NASA Astrophysics Data System (ADS)
Jacob, R. L.; Catlett, C.; Beckman, P. H.; Sankaran, R.
2017-12-01
The Array of Things (AoT) is an experimental sensor and edge-computing network being deployed in the City of Chicago. An AoT node contains sensors for temperature, pressure, humidty and several trace gases as well as 4-core CPU and full Linux operating system. Custom software called "Waggle" controls the hardware and provides the data collection and transmission services. Each node is attached to a traffic signal light and has power 24/7. Data is sent over the cellular network in near realtime. With Chicago's Department of Transportation, we have been making test deployments of AoT nodes, evaluating their capabilities and comparing collected data with that from other observing systems in the Chicago area.
Intranode data communications in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E
2014-01-07
Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Intranode data communications in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E
2013-07-23
Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Shang, Fengjun; Jiang, Yi; Xiong, Anping; Su, Wen; He, Li
2016-11-18
With the integrated development of the Internet, wireless sensor technology, cloud computing, and mobile Internet, there has been a lot of attention given to research about and applications of the Internet of Things. A Wireless Sensor Network (WSN) is one of the important information technologies in the Internet of Things; it integrates multi-technology to detect and gather information in a network environment by mutual cooperation, using a variety of methods to process and analyze data, implement awareness, and perform tests. This paper mainly researches the localization algorithm of sensor nodes in a wireless sensor network. Firstly, a multi-granularity region partition is proposed to divide the location region. In the range-based method, the RSSI (Received Signal Strength indicator, RSSI) is used to estimate distance. The optimal RSSI value is computed by the Gaussian fitting method. Furthermore, a Voronoi diagram is characterized by the use of dividing region. Rach anchor node is regarded as the center of each region; the whole position region is divided into several regions and the sub-region of neighboring nodes is combined into triangles while the unknown node is locked in the ultimate area. Secondly, the multi-granularity regional division and Lagrange multiplier method are used to calculate the final coordinates. Because nodes are influenced by many factors in the practical application, two kinds of positioning methods are designed. When the unknown node is inside positioning unit, we use the method of vector similarity. Moreover, we use the centroid algorithm to calculate the ultimate coordinates of unknown node. When the unknown node is outside positioning unit, we establish a Lagrange equation containing the constraint condition to calculate the first coordinates. Furthermore, we use the Taylor expansion formula to correct the coordinates of the unknown node. In addition, this localization method has been validated by establishing the real environment.
Numerical Modeling of Saturated Boiling in a Heated Tube
NASA Technical Reports Server (NTRS)
Majumdar, Alok; LeClair, Andre; Hartwig, Jason
2017-01-01
This paper describes a mathematical formulation and numerical solution of boiling in a heated tube. The mathematical formulation involves a discretization of the tube into a flow network consisting of fluid nodes and branches and a thermal network consisting of solid nodes and conductors. In the fluid network, the mass, momentum and energy conservation equations are solved and in the thermal network, the energy conservation equation of solids is solved. A pressure-based, finite-volume formulation has been used to solve the equations in the fluid network. The system of equations is solved by a hybrid numerical scheme which solves the mass and momentum conservation equations by a simultaneous Newton-Raphson method and the energy conservation equation by a successive substitution method. The fluid network and thermal network are coupled through heat transfer between the solid and fluid nodes which is computed by Chen's correlation of saturated boiling heat transfer. The computer model is developed using the Generalized Fluid System Simulation Program and the numerical predictions are compared with test data.
A phantom axon setup for validating models of action potential recordings.
Rossel, Olivier; Soulier, Fabien; Bernard, Serge; Guiraud, David; Cathébras, Guy
2016-08-01
Electrode designs and strategies for electroneurogram recordings are often tested first by computer simulations and then by animal models, but they are rarely implanted for long-term evaluation in humans. The models show that the amplitude of the potential at the surface of an axon is higher in front of the nodes of Ranvier than at the internodes; however, this has not been investigated through in vivo measurements. An original experimental method is presented to emulate a single fiber action potential in an infinite conductive volume, allowing the potential of an axon to be recorded at both the nodes of Ranvier and the internodes, for a wide range of electrode-to-fiber radial distances. The paper particularly investigates the differences in the action potential amplitude along the longitudinal axis of an axon. At a short radial distance, the action potential amplitude measured in front of a node of Ranvier is two times larger than in the middle of two nodes. Moreover, farther from the phantom axon, the measured action potential amplitude is almost constant along the longitudinal axis. The results of this new method confirm the computer simulations, with a correlation of 97.6 %.
Spacecraft On-Board Information Extraction Computer (SOBIEC)
NASA Technical Reports Server (NTRS)
Eisenman, David; Decaro, Robert E.; Jurasek, David W.
1994-01-01
The Jet Propulsion Laboratory is the Technical Monitor on an SBIR Program issued for Irvine Sensors Corporation to develop a highly compact, dual use massively parallel processing node known as SOBIEC. SOBIEC couples 3D memory stacking technology provided by nCUBE. The node contains sufficient network Input/Output to implement up to an order-13 binary hypercube. The benefit of this network, is that it scales linearly as more processors are added, and it is a superset of other commonly used interconnect topologies such as: meshes, rings, toroids, and trees. In this manner, a distributed processing network can be easily devised and supported. The SOBIEC node has sufficient memory for most multi-computer applications, and also supports external memory expansion and DMA interfaces. The SOBIEC node is supported by a mature set of software development tools from nCUBE. The nCUBE operating system (OS) provides configuration and operational support for up to 8000 SOBIEC processors in an order-13 binary hypercube or any subset or partition(s) thereof. The OS is UNIX (USL SVR4) compatible, with C, C++, and FORTRAN compilers readily available. A stand-alone development system is also available to support SOBIEC test and integration.
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Distributed computation of graphics primitives on a transputer network
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
A method is developed for distributing the computation of graphics primitives on a parallel processing network. Off-the-shelf transputer boards are used to perform the graphics transformations and scan-conversion tasks that would normally be assigned to a single transputer based display processor. Each node in the network performs a single graphics primitive computation. Frequently requested tasks can be duplicated on several nodes. The results indicate that the current distribution of commands on the graphics network shows a performance degradation when compared to the graphics display board alone. A change to more computation per node for every communication (perform more complex tasks on each node) may cause the desired increase in throughput.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-03-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.
A Novel Polygonal Finite Element Method: Virtual Node Method
NASA Astrophysics Data System (ADS)
Tang, X. H.; Zheng, C.; Zhang, J. H.
2010-05-01
Polygonal finite element method (PFEM), which can construct shape functions on polygonal elements, provides greater flexibility in mesh generation. However, the non-polynomial form of traditional PFEM, such as Wachspress method and Mean Value method, leads to inexact numerical integration. Since the integration technique for non-polynomial functions is immature. To overcome this shortcoming, a great number of integration points have to be used to obtain sufficiently exact results, which increases computational cost. In this paper, a novel polygonal finite element method is proposed and called as virtual node method (VNM). The features of present method can be list as: (1) It is a PFEM with polynomial form. Thereby, Hammer integral and Gauss integral can be naturally used to obtain exact numerical integration; (2) Shape functions of VNM satisfy all the requirements of finite element method. To test the performance of VNM, intensive numerical tests are carried out. It found that, in standard patch test, VNM can achieve significantly better results than Wachspress method and Mean Value method. Moreover, it is observed that VNM can achieve better results than triangular 3-node elements in the accuracy test.
Hively, Lee M [Philadelphia, TN
2011-07-12
The invention relates to a method and apparatus for simultaneously processing different sources of test data into informational data and then processing different categories of informational data into knowledge-based data. The knowledge-based data can then be communicated between nodes in a system of multiple computers according to rules for a type of complex, hierarchical computer system modeled on a human brain.
Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Heidelberger, Philip; Senger, Robert M; Salapura, Valentina; Steinmacher-Burow, Burkhard; Sugawara, Yutaka; Takken, Todd E
2013-08-27
Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
NASA Astrophysics Data System (ADS)
Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping
2017-11-01
In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
Optimizing Resource Utilization in Grid Batch Systems
NASA Astrophysics Data System (ADS)
Gellrich, Andreas
2012-12-01
On Grid sites, the requirements of the computing tasks (jobs) to computing, storage, and network resources differ widely. For instance Monte Carlo production jobs are almost purely CPU-bound, whereas physics analysis jobs demand high data rates. In order to optimize the utilization of the compute node resources, jobs must be distributed intelligently over the nodes. Although the job resource requirements cannot be deduced directly, jobs are mapped to POSIX UID/GID according to the VO, VOMS group and role information contained in the VOMS proxy. The UID/GID then allows to distinguish jobs, if users are using VOMS proxies as planned by the VO management, e.g. ‘role=production’ for Monte Carlo jobs. It is possible to setup and configure batch systems (queuing system and scheduler) at Grid sites based on these considerations although scaling limits were observed with the scheduler MAUI. In tests these limitations could be overcome with a home-made scheduler.
NASA Astrophysics Data System (ADS)
Jia, Weile; Wang, Jue; Chi, Xuebin; Wang, Lin-Wang
2017-02-01
LS3DF, namely linear scaling three-dimensional fragment method, is an efficient linear scaling ab initio total energy electronic structure calculation code based on a divide-and-conquer strategy. In this paper, we present our GPU implementation of the LS3DF code. Our test results show that the GPU code can calculate systems with about ten thousand atoms fully self-consistently in the order of 10 min using thousands of computing nodes. This makes the electronic structure calculations of 10,000-atom nanosystems routine work. This speed is 4.5-6 times faster than the CPU calculations using the same number of nodes on the Titan machine in the Oak Ridge leadership computing facility (OLCF). Such speedup is achieved by (a) carefully re-designing of the computationally heavy kernels; (b) redesign of the communication pattern for heterogeneous supercomputers.
A multiarchitecture parallel-processing development environment
NASA Technical Reports Server (NTRS)
Townsend, Scott; Blech, Richard; Cole, Gary
1993-01-01
A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Three layers multi-granularity OCDM switching system based on learning-stateful PCE
NASA Astrophysics Data System (ADS)
Wang, Yubao; Liu, Yanfei; Sun, Hao
2017-10-01
In the existing three layers multi-granularity OCDM switching system (TLMG-OCDMSS), F-LSP, L-LSP and OC-LSP can be bundled as switching granularity. For CPU-intensive network, the node not only needs to compute the path but also needs to bundle the switching granularity so that the load of single node is heavy. The node will paralyze when the traffic of the node is too heavy, which will impact the performance of the whole network seriously. The introduction of stateful PCE(S-PCE) will effectively solve these problems. PCE is composed of two parts, namely, the path computation element and the database (TED and LSPDB), and returns the result of path computation to PCC (path computation clients) after PCC sends the path computation request to it. In this way, the pressure of the distributed path computation in each node is reduced. In this paper, we propose the concept of Learning PCE (L-PCE), which uses the existing LSPDB as the data source of PCE's learning. By this means, we can simplify the path computation and reduce the network delay, as a result, improving the performance of network.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-11-12
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Element fracture technique for hypervelocity impact simulation
NASA Astrophysics Data System (ADS)
Zhang, Xiao-tian; Li, Xiao-gang; Liu, Tao; Jia, Guang-hui
2015-05-01
Hypervelocity impact dynamics is the theoretical support of spacecraft shielding against space debris. The numerical simulation has become an important approach for obtaining the ballistic limits of the spacecraft shields. Currently, the most widely used algorithm for hypervelocity impact is the smoothed particle hydrodynamics (SPH). Although the finite element method (FEM) is widely used in fracture mechanics and low-velocity impacts, the standard FEM can hardly simulate the debris cloud generated by hypervelocity impact. This paper presents a successful application of the node-separation technique for hypervelocity impact debris cloud simulation. The node-separation technique assigns individual/coincident nodes for the adjacent elements, and it applies constraints to the coincident node sets in the modeling step. In the explicit iteration, the cracks are generated by releasing the constrained node sets that meet the fracture criterion. Additionally, the distorted elements are identified from two aspects - self-piercing and phase change - and are deleted so that the constitutive computation can continue. FEM with the node-separation technique is used for thin-wall hypervelocity impact simulations. The internal structures of the debris cloud in the simulation output are compared with that in the test X-ray graphs under different material fracture criteria. It shows that the pressure criterion is more appropriate for hypervelocity impact. The internal structures of the debris cloud are also simulated and compared under different thickness-to-diameter ratios (t/D). The simulation outputs show the same spall pattern with the tests. Finally, the triple-plate impact case is simulated with node-separation FEM.
Chen, Dong; Eisley, Noel A.; Steinmacher-Burow, Burkhard; Heidelberger, Philip
2013-01-29
A computer implemented method and a system for routing data packets in a multi-dimensional computer network. The method comprises routing a data packet among nodes along one dimension towards a root node, each node having input and output communication links, said root node not having any outgoing uplinks, and determining at each node if the data packet has reached a predefined coordinate for the dimension or an edge of the subrectangle for the dimension, and if the data packet has reached the predefined coordinate for the dimension or the edge of the subrectangle for the dimension, determining if the data packet has reached the root node, and if the data packet has not reached the root node, routing the data packet among nodes along another dimension towards the root node.
Anatomical classification of breast sentinel lymph nodes using computed tomography-lymphography.
Fujita, Tamaki; Miura, Hiroyuki; Seino, Hiroko; Ono, Shuichi; Nishi, Takashi; Nishimura, Akimasa; Hakamada, Kenichi; Aoki, Masahiko
2018-05-03
To evaluate the anatomical classification and location of breast sentinel lymph nodes, preoperative computed tomography-lymphography examinations were retrospectively reviewed for sentinel lymph nodes in 464 cases clinically diagnosed with node-negative breast cancer between July 2007 and June 2016. Anatomical classification was performed based on the numbers of lymphatic routes and sentinel lymph nodes, the flow direction of lymphatic routes, and the location of sentinel lymph nodes. Of the 464 cases reviewed, anatomical classification could be performed in 434 (93.5 %). The largest number of cases showed single route/single sentinel lymph node (n = 296, 68.2 %), followed by multiple routes/multiple sentinel lymph nodes (n = 59, 13.6 %), single route/multiple sentinel lymph nodes (n = 53, 12.2 %), and multiple routes/single sentinel lymph node (n = 26, 6.0 %). Classification based on the flow direction of lymphatic routes showed that 429 cases (98.8 %) had outward flow on the superficial fascia toward axillary lymph nodes, whereas classification based on the height of sentinel lymph nodes showed that 323 cases (74.4 %) belonged to the upper pectoral group of axillary lymph nodes. There was wide variation in the number of lymphatic routes and their branching patterns and in the number, location, and direction of flow of sentinel lymph nodes. It is clinically very important to preoperatively understand the anatomical morphology of lymphatic routes and sentinel lymph nodes for optimal treatment of breast cancer, and computed tomography-lymphography is suitable for this purpose.
Network Coding for Function Computation
ERIC Educational Resources Information Center
Appuswamy, Rathinakumar
2011-01-01
In this dissertation, the following "network computing problem" is considered. Source nodes in a directed acyclic network generate independent messages and a single receiver node computes a target function f of the messages. The objective is to maximize the average number of times f can be computed per network usage, i.e., the "computing…
GEANT4 distributed computing for compact clusters
NASA Astrophysics Data System (ADS)
Harrawood, Brian P.; Agasthya, Greeshma A.; Lakshmanan, Manu N.; Raterman, Gretchen; Kapadia, Anuj J.
2014-11-01
A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C++ class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client-server work flow model to specify the parameters for each individual run of the simulation. The new g4DistributedRunManager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through-put of a single model.
Lynch, Rod; Pitson, Graham; Ball, David; Claude, Line; Sarrut, David
2013-01-01
To develop a reproducible definition for each mediastinal lymph node station based on the new TNM classification for lung cancer. This paper proposes an atlas using the new international lymph node map used in the seventh edition of the TNM classification for lung cancer. Four radiation oncologists and 1 diagnostic radiologist were involved in the project to put forward a reproducible radiologic description for the lung lymph node stations. The International Association for the Study of Lung Cancer lymph node definitions for stations 1 to 11 have been described and illustrated on axial computed tomographic scan images using a certified radiotherapy planning system. This atlas will assist both diagnostic radiologists and radiation oncologists in accurately defining the lymph node stations on computed tomographic scan in patients diagnosed with lung cancer. Copyright © 2013 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Parallel checksumming of data chunks of a shared data object using a log-structured file system
Bent, John M.; Faibish, Sorin; Grider, Gary
2016-09-06
Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.
Wireless Data-Acquisition System for Testing Rocket Engines
NASA Technical Reports Server (NTRS)
Lin, Chujen; Lonske, Ben; Hou, Yalin; Xu, Yingjiu; Gang, Mei
2007-01-01
A prototype wireless data-acquisition system has been developed as a potential replacement for a wired data-acquisition system heretofore used in testing rocket engines. The traditional use of wires to connect sensors, signal-conditioning circuits, and data acquisition circuitry is time-consuming and prone to error, especially when, as is often the case, many sensors are used in a test. The system includes one master and multiple slave nodes. The master node communicates with a computer via an Ethernet connection. The slave nodes are powered by rechargeable batteries and are packaged in weatherproof enclosures. The master unit and each of the slave units are equipped with a time-modulated ultra-wide-band (TMUWB) radio transceiver, which spreads its RF energy over several gigahertz by transmitting extremely low-power and super-narrow pulses. In this prototype system, each slave node can be connected to as many as six sensors: two sensors can be connected directly to analog-to-digital converters (ADCs) in the slave node and four sensors can be connected indirectly to the ADCs via signal conditioners. The maximum sampling rate for streaming data from any given sensor is about 5 kHz. The bandwidth of one channel of the TM-UWB radio communication system is sufficient to accommodate streaming of data from five slave nodes when they are fully loaded with data collected through all possible sensor connections. TM-UWB radios have a much higher spatial capacity than traditional sinusoidal wave-based radios. Hence, this TM-UWB wireless data-acquisition can be scaled to cover denser sensor setups for rocket engine test stands. Another advantage of TM-UWB radios is that it will not interfere with existing wireless transmission. The maximum radio-communication range between the master node and a slave node for this prototype system is about 50 ft (15 m) when the master and slave transceivers are equipped with small dipole antennas. The range can be increased by changing to larger antennas and/or greater transmission power. The battery life of a slave node ranges from about six hours during operation at full capacity to as long as three days when the system is in a "sleep" mode used to conserve battery charge during times between setup and rocket-engine testing. Batteries can be added to prolong operational lifetimes. The radio transceiver dominates the power consumption.
Bad data packet capture device
Chen, Dong; Gara, Alan; Heidelberger, Philip; Vranas, Pavlos
2010-04-20
An apparatus and method for capturing data packets for analysis on a network computing system includes a sending node and a receiving node connected by a bi-directional communication link. The sending node sends a data transmission to the receiving node on the bi-directional communication link, and the receiving node receives the data transmission and verifies the data transmission to determine valid data and invalid data and verify retransmissions of invalid data as corresponding valid data. A memory device communicates with the receiving node for storing the invalid data and the corresponding valid data. A computing node communicates with the memory device and receives and performs an analysis of the invalid data and the corresponding valid data received from the memory device.
Heuristic-driven graph wavelet modeling of complex terrain
NASA Astrophysics Data System (ADS)
Cioacǎ, Teodor; Dumitrescu, Bogdan; Stupariu, Mihai-Sorin; Pǎtru-Stupariu, Ileana; Nǎpǎrus, Magdalena; Stoicescu, Ioana; Peringer, Alexander; Buttler, Alexandre; Golay, François
2015-03-01
We present a novel method for building a multi-resolution representation of large digital surface models. The surface points coincide with the nodes of a planar graph which can be processed using a critically sampled, invertible lifting scheme. To drive the lazy wavelet node partitioning, we employ an attribute aware cost function based on the generalized quadric error metric. The resulting algorithm can be applied to multivariate data by storing additional attributes at the graph's nodes. We discuss how the cost computation mechanism can be coupled with the lifting scheme and examine the results by evaluating the root mean square error. The algorithm is experimentally tested using two multivariate LiDAR sets representing terrain surface and vegetation structure with different sampling densities.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
Matching pursuit parallel decomposition of seismic data
NASA Astrophysics Data System (ADS)
Li, Chuanhui; Zhang, Fanchang
2017-07-01
In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Development of an Autonomous Navigation Technology Test Vehicle
2004-08-01
as an independent thread on processors using the Linux operating system. The computer hardware selected for the nodes that host the MRS threads...communications system design. Linux was chosen as the operating system for all of the single board computers used on the Mule. Linux was specifically...used for system analysis and development. The simple realization of multi-thread processing and inter-process communications in Linux made it a
High Fidelity Simulations of Unsteady Flow through Turbopumps and Flowliners
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, dochan; Chan, William; Housman, Jeff
2006-01-01
High fidelity computations were carried out to analyze the orbiter LH2 feedline flowliner. Computations were performed on the Columbia platform which is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processor each. Various computational models were used to characterize the unsteady flow features in the turbopump, including the orbiter Low-Pressure-Fuel-Turbopump (LPFTP) inducer, the orbiter manifold and a test article used to represent the manifold. Unsteady flow originating from the orbiter LPFTP inducer is one of the major contributors to the high frequency cyclic loading that results in high cycle fatigue damage to the gimbal flowliners just upstream of the LPFTP. The flow fields for the orbiter manifold and representative test article are computed and analyzed for similarities and differences. The incompressible Navier-Stokes flow solver INS3D, based on the artificial compressibility method, was used to compute the flow of liquid hydrogen in each test article.
NASA Astrophysics Data System (ADS)
Xu, Guoping; Udupa, Jayaram K.; Tong, Yubing; Cao, Hanqiang; Odhner, Dewey; Torigian, Drew A.; Wu, Xingyu
2018-03-01
Currently, there are many papers that have been published on the detection and segmentation of lymph nodes from medical images. However, it is still a challenging problem owing to low contrast with surrounding soft tissues and the variations of lymph node size and shape on computed tomography (CT) images. This is particularly very difficult on low-dose CT of PET/CT acquisitions. In this study, we utilize our previous automatic anatomy recognition (AAR) framework to recognize the thoracic-lymph node stations defined by the International Association for the Study of Lung Cancer (IASLC) lymph node map. The lymph node stations themselves are viewed as anatomic objects and are localized by using a one-shot method in the AAR framework. Two strategies have been taken in this paper for integration into AAR framework. The first is to combine some lymph node stations into composite lymph node stations according to their geometrical nearness. The other is to find the optimal parent (organ or union of organs) as an anchor for each lymph node station based on the recognition error and thereby find an overall optimal hierarchy to arrange anchor organs and lymph node stations. Based on 28 contrast-enhanced thoracic CT image data sets for model building, 12 independent data sets for testing, our results show that thoracic lymph node stations can be localized within 2-3 voxels compared to the ground truth.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Gooding, Thomas Michael [Rochester, MN
2011-04-19
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
Nogami, Yuya; Banno, Kouji; Irie, Haruko; Iida, Miho; Kisu, Iori; Masugi, Yohei; Tanaka, Kyoko; Tominaga, Eiichiro; Okuda, Shigeo; Murakami, Koji; Aoki, Daisuke
2015-01-01
We studied the diagnostic performance of (18)F-fluoro-2-deoxy-d-glucose-positron emission tomography/computed tomography in cervical and endometrial cancers with particular focus on lymph node metastases. Seventy patients with cervical cancer and 53 with endometrial cancer were imaged with (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography before lymphadenectomy. We evaluated the diagnostic performance of (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography using the final pathological diagnoses as the golden standard. We calculated the sensitivity, specificity, positive predictive value and negative predictive value of (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography. In cervical cancer, the results evaluated by cases were 33.3, 92.7, 55.6 and 83.6%, respectively. When evaluated by the area of lymph nodes, the results were 30.6, 98.9, 55.0 and 97.0%, respectively. As for endometrial cancer, the results evaluated by cases were 50.0, 93.9, 40.0 and 95.8%, and by area of lymph nodes, 45.0, 99.4, 64.3 and 98.5%, respectively. The limitation of the efficacy was found out by analyzing it by the region of the lymph node, the size of metastatic node, the historical type of tumor in cervical cancer and the prevalence of lymph node metastasis. The efficacy of positron emission tomography/computed tomography regarding the detection of lymph node metastasis in cervical and endometrial cancer is not established and has limitations associated with the region of the lymph node, the size of metastasis lesion in lymph node and the pathological type of primary tumor. The indication for the imaging and the interpretation of the results requires consideration for each case by the pretest probability based on the information obtained preoperatively. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
IPv6 testing and deployment at Prague Tier 2
NASA Astrophysics Data System (ADS)
Kouba, Tomáŝ; Chudoba, Jiří; Eliáŝ, Marek; Fiala, Lukáŝ
2012-12-01
Computing Center of the Institute of Physics in Prague provides computing and storage resources for various HEP experiments (D0, Atlas, Alice, Auger) and currently operates more than 300 worker nodes with more than 2500 cores and provides more than 2PB of disk space. Our site is limited to one C-sized block of IPv4 addresses, and hence we had to move most of our worker nodes behind the NAT. However this solution demands more difficult routing setup. We see the IPv6 deployment as a solution that provides less routing, more switching and therefore promises higher network throughput. The administrators of the Computing Center strive to configure and install all provided services automatically. For installation tasks we use PXE and kickstart, for network configuration we use DHCP and for software configuration we use CFEngine. Many hardware boxes are configured via specific web pages or telnet/ssh protocol provided by the box itself. All our services are monitored with several tools e.g. Nagios, Munin, Ganglia. We rely heavily on the SNMP protocol for hardware health monitoring. All these installation, configuration and monitoring tools must be tested before we can switch completely to IPv6 network stack. In this contribution we present the tests we have made, limitations we have faced and configuration decisions that we have made during IPv6 testing. We also present testbed built on virtual machines that was used for all the testing and evaluation.
BridgeRank: A novel fast centrality measure based on local structure of the network
NASA Astrophysics Data System (ADS)
Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh
2018-04-01
Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.
Global tree network for computing structures enabling global processing operations
Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.
2010-01-19
A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
ERDC MSRC Resource. High Performance Computing for the Warfighter. Spring 2006
2006-01-01
named Ruby, and the HP/Compaq SC45, named Emerald , continue to add their unique sparkle to the ERDC MSRC computer infrastructure. ERDC invited the...configuration on B-52H purchased additional memory for the login nodes so that this part of the solution process could be done as a preprocessing step. On...application and system services. Of the service nodes, 10 are login nodes and 23 are input/output (I/O) server nodes for the Lustre file system (i.e., the
A feasibility study on porting the community land model onto accelerators using OpenACC
Wang, Dali; Wu, Wei; Winkler, Frank; ...
2014-01-01
As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less
Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing
ERIC Educational Resources Information Center
Loh, Matthew
2013-01-01
The scalability of CMOS technology has driven computation into a diverse range of applications across the power consumption, performance and size spectra. Communication is a necessary adjunct to computation, and whether this is to push data from node-to-node in a high-performance computing cluster or from the receiver of wireless link to a neural…
Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu
2018-04-20
A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation
NASA Astrophysics Data System (ADS)
Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.
2014-12-01
Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Joshi, Prathamesh; Lele, Vikram; Mahajan, Pravin
2012-01-01
We report a case documenting fluorodeoxyglucose (FDG) accumulation in cervical, supraclavicular and axillary lymph nodes resulting from acute toxoplasmosis. A 50-year-old Indian female with history of non-Hodgkin's lymphoma (NHL) of left breast, postchemotherapy status, was found to have hypermetabolic right cervical, supraclavicular and axillary lymph nodes on a surveillance FDG positron emission tomography/computed tomography (PET/CT) scan. Her previous two PET/CT scans were unremarkable with no evidence of metabolically active disease. Therefore, a differential diagnosis of relapse of NHL versus infectious/inflammatory pathology was raised in the report. Biopsy of axillary lymph node demonstrated features characteristic of toxoplasmosis. The serological test results were also compatible with acute toxoplasmosis infection. Infective and inflammatory diseases are known to accumulate FDG, resulting in false positives for malignancy. This case demonstrates lymph nodal toxoplasmosis as a potential cause of false positive FDG PET/CT findings in patients with known malignancy and highlights the importance of histopathological and laboratory correlation for the accurate interpretation of FDG PET/CT scans.
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Signaling completion of a message transfer from an origin compute node to a target compute node
Blocksome, Michael A [Rochester, MN; Parker, Jeffrey J [Rochester, MN
2011-05-24
Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local direct put transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor.
Dedicated heterogeneous node scheduling including backfill scheduling
Wood, Robert R [Livermore, CA; Eckert, Philip D [Livermore, CA; Hommes, Gregg [Pleasanton, CA
2006-07-25
A method and system for job backfill scheduling dedicated heterogeneous nodes in a multi-node computing environment. Heterogeneous nodes are grouped into homogeneous node sub-pools. For each sub-pool, a free node schedule (FNS) is created so that the number of to chart the free nodes over time. For each prioritized job, using the FNS of sub-pools having nodes useable by a particular job, to determine the earliest time range (ETR) capable of running the job. Once determined for a particular job, scheduling the job to run in that ETR. If the ETR determined for a lower priority job (LPJ) has a start time earlier than a higher priority job (HPJ), then the LPJ is scheduled in that ETR if it would not disturb the anticipated start times of any HPJ previously scheduled for a future time. Thus, efficient utilization and throughput of such computing environments may be increased by utilizing resources otherwise remaining idle.
Signaling completion of a message transfer from an origin compute node to a target compute node
Blocksome, Michael A [Rochester, MN
2011-02-15
Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local memory FIFO data transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor.
Wang, Zhi-Long; Zhou, Zhi-Guo; Chen, Ying; Li, Xiao-Ting; Sun, Ying-Shi
The aim of this study was to diagnose lymph node metastasis of esophageal cancer by support vector machines model based on computed tomography. A total of 131 esophageal cancer patients with preoperative chemotherapy and radical surgery were included. Various indicators (tumor thickness, tumor length, tumor CT value, total number of lymph nodes, and long axis and short axis sizes of largest lymph node) on CT images before and after neoadjuvant chemotherapy were recorded. A support vector machines model based on these CT indicators was built to predict lymph node metastasis. Support vector machines model diagnosed lymph node metastasis better than preoperative short axis size of largest lymph node on CT. The area under the receiver operating characteristic curves were 0.887 and 0.705, respectively. The support vector machine model of CT images can help diagnose lymph node metastasis in esophageal cancer with preoperative chemotherapy.
A bioinformatics knowledge discovery in text application for grid computing
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-01-01
Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.
Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco
2009-06-16
A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
Lymph node detection in IASLC-defined zones on PET/CT images
NASA Astrophysics Data System (ADS)
Song, Yihua; Udupa, Jayaram K.; Odhner, Dewey; Tong, Yubing; Torigian, Drew A.
2016-03-01
Lymph node detection is challenging due to the low contrast between lymph nodes as well as surrounding soft tissues and the variation in nodal size and shape. In this paper, we propose several novel ideas which are combined into a system to operate on positron emission tomography/ computed tomography (PET/CT) images to detect abnormal thoracic nodes. First, our previous Automatic Anatomy Recognition (AAR) approach is modified where lymph node zones predominantly following International Association for the Study of Lung Cancer (IASLC) specifications are modeled as objects arranged in a hierarchy along with key anatomic anchor objects. This fuzzy anatomy model built from diagnostic CT images is then deployed on PET/CT images for automatically recognizing the zones. A novel globular filter (g-filter) to detect blob-like objects over a specified range of sizes is designed to detect the most likely locations and sizes of diseased nodes. Abnormal nodes within each automatically localized zone are subsequently detected via combined use of different items of information at various scales: lymph node zone model poses found at recognition indicating the geographic layout at the global level of node clusters, g-filter response which hones in on and carefully selects node-like globular objects at the node level, and CT and PET gray value but within only the most plausible nodal regions for node presence at the voxel level. The models are built from 25 diagnostic CT scans and refined for an object hierarchy based on a separate set of 20 diagnostic CT scans. Node detection is tested on an additional set of 20 PET/CT scans. Our preliminary results indicate node detection sensitivity and specificity at around 90% and 85%, respectively.
National Test Bed Security and Communications Architecture Working Group Report
1992-04-01
computer systems via a physical medium. Most of those physical media are tappable or interceptable. This means that all the data that flows across the...provides the capability for NTBN nodes to support users operating in differing COIs to share the computing resources and communication media and for...representation. Again generally speaking, the NTBN must act as the high-speed, wide-bandwidth communications media that would provide the "near real-time
Spiking network simulation code for petascale computers.
Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M; Plesser, Hans E; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz
2014-01-01
Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today.
Spiking network simulation code for petascale computers
Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M.; Plesser, Hans E.; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz
2014-01-01
Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today. PMID:25346682
Discovery of a missing disease spreader
NASA Astrophysics Data System (ADS)
Maeno, Yoshiharu
2011-10-01
This study presents a method to discover an outbreak of an infectious disease in a region for which data are missing, but which is at work as a disease spreader. Node discovery for the spread of an infectious disease is defined as discriminating between the nodes which are neighboring to a missing disease spreader node, and the rest, given a dataset on the number of cases. The spread is described by stochastic differential equations. A perturbation theory quantifies the impact of the missing spreader on the moments of the number of cases. Statistical discriminators examine the mid-body or tail-ends of the probability density function, and search for the disturbance from the missing spreader. They are tested with computationally synthesized datasets, and applied to the SARS outbreak and flu pandemic.
Data Summarization in the Node by Parameters (DSNP): Local Data Fusion in an IoT Environment.
Maschi, Luis F C; Pinto, Alex S R; Meneguette, Rodolfo I; Baldassin, Alexandro
2018-03-07
With the advent of the Internet of Things, billions of objects or devices are inserted into the global computer network, generating and processing data at a volume never imagined before. This paper proposes a way to collect and process local data through a data fusion technology called summarization. The main feature of the proposal is the local data fusion, through parameters provided by the application, ensuring the quality of data collected by the sensor node. In the evaluation, the sensor node was compared when performing the data summary with another that performed a continuous recording of the collected data. Two sets of nodes were created, one with a sensor node that analyzed the luminosity of the room, which in this case obtained a reduction of 97% in the volume of data generated, and another set that analyzed the temperature of the room, obtaining a reduction of 80% in the data volume. Through these tests, it has been proven that the local data fusion at the node can be used to reduce the volume of data generated, consequently decreasing the volume of messages generated by IoT environments.
On the Feasibility of Wireless Multimedia Sensor Networks over IEEE 802.15.5 Mesh Topologies
Garcia-Sanchez, Antonio-Javier; Losilla, Fernando; Rodenas-Herraiz, David; Cruz-Martinez, Felipe; Garcia-Sanchez, Felipe
2016-01-01
Wireless Multimedia Sensor Networks (WMSNs) are a special type of Wireless Sensor Network (WSN) where large amounts of multimedia data are transmitted over networks composed of low power devices. Hierarchical routing protocols typically used in WSNs for multi-path communication tend to overload nodes located within radio communication range of the data collection unit or data sink. The battery life of these nodes is therefore reduced considerably, requiring frequent battery replacement work to extend the operational life of the WSN system. In a wireless sensor network with mesh topology, any node may act as a forwarder node, thereby enabling multiple routing paths toward any other node or collection unit. In addition, mesh topologies have proven advantages, such as data transmission reliability, network robustness against node failures, and potential reduction in energy consumption. This work studies the feasibility of implementing WMSNs in mesh topologies and their limitations by means of exhaustive computer simulation experiments. To this end, a module developed for the Synchronous Energy Saving (SES) mode of the IEEE 802.15.5 mesh standard has been integrated with multimedia tools to thoroughly test video sequences encoded using H.264 in mesh networks. PMID:27164106
On the Feasibility of Wireless Multimedia Sensor Networks over IEEE 802.15.5 Mesh Topologies.
Garcia-Sanchez, Antonio-Javier; Losilla, Fernando; Rodenas-Herraiz, David; Cruz-Martinez, Felipe; Garcia-Sanchez, Felipe
2016-05-05
Wireless Multimedia Sensor Networks (WMSNs) are a special type of Wireless Sensor Network (WSN) where large amounts of multimedia data are transmitted over networks composed of low power devices. Hierarchical routing protocols typically used in WSNs for multi-path communication tend to overload nodes located within radio communication range of the data collection unit or data sink. The battery life of these nodes is therefore reduced considerably, requiring frequent battery replacement work to extend the operational life of the WSN system. In a wireless sensor network with mesh topology, any node may act as a forwarder node, thereby enabling multiple routing paths toward any other node or collection unit. In addition, mesh topologies have proven advantages, such as data transmission reliability, network robustness against node failures, and potential reduction in energy consumption. This work studies the feasibility of implementing WMSNs in mesh topologies and their limitations by means of exhaustive computer simulation experiments. To this end, a module developed for the Synchronous Energy Saving (SES) mode of the IEEE 802.15.5 mesh standard has been integrated with multimedia tools to thoroughly test video sequences encoded using H.264 in mesh networks.
NASA Astrophysics Data System (ADS)
Valasek, Lukas; Glasa, Jan
2017-12-01
Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
A uniform approach for programming distributed heterogeneous computing systems
Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio; Fahringer, Thomas
2014-01-01
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater’s performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations. PMID:25844015
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop
NASA Astrophysics Data System (ADS)
Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.
2018-04-01
The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
A uniform approach for programming distributed heterogeneous computing systems.
Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio; Fahringer, Thomas
2014-12-01
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.
Noguchi uses laptop computer in the Node 2 during Expedition 22
2010-01-19
ISS022-E-030641 (19 Jan. 2010) --- Japan Aerospace Exploration Agency (JAXA) astronaut Soichi Noguchi, Expedition 22 flight engineer, uses a computer in the Harmony node of the International Space Station.
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
An S N Algorithm for Modern Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Randal Scott
2016-08-29
LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Multi-hop routing mechanism for reliable sensor computing.
Chen, Jiann-Liang; Ma, Yi-Wei; Lai, Chia-Ping; Hu, Chia-Cheng; Huang, Yueh-Min
2009-01-01
Current research on routing in wireless sensor computing concentrates on increasing the service lifetime, enabling scalability for large number of sensors and supporting fault tolerance for battery exhaustion and broken nodes. A sensor node is naturally exposed to various sources of unreliable communication channels and node failures. Sensor nodes have many failure modes, and each failure degrades the network performance. This work develops a novel mechanism, called Reliable Routing Mechanism (RRM), based on a hybrid cluster-based routing protocol to specify the best reliable routing path for sensor computing. Table-driven intra-cluster routing and on-demand inter-cluster routing are combined by changing the relationship between clusters for sensor computing. Applying a reliable routing mechanism in sensor computing can improve routing reliability, maintain low packet loss, minimize management overhead and save energy consumption. Simulation results indicate that the reliability of the proposed RRM mechanism is around 25% higher than that of the Dynamic Source Routing (DSR) and ad hoc On-demand Distance Vector routing (AODV) mechanisms.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
A universal computer control system for motors
NASA Technical Reports Server (NTRS)
Szakaly, Zoltan F. (Inventor)
1991-01-01
A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
NASA Astrophysics Data System (ADS)
Sun, Dan; Garmory, Andrew; Page, Gary J.
2017-02-01
For flows where the particle number density is low and the Stokes number is relatively high, as found when sand or ice is ingested into aircraft gas turbine engines, streams of particles can cross each other's path or bounce from a solid surface without being influenced by inter-particle collisions. The aim of this work is to develop an Eulerian method to simulate these types of flow. To this end, a two-node quadrature-based moment method using 13 moments is proposed. In the proposed algorithm thirteen moments of particle velocity, including cross-moments of second order, are used to determine the weights and abscissas of the two nodes and to set up the association between the velocity components in each node. Previous Quadrature Method of Moments (QMOM) algorithms either use more than two nodes, leading to increased computational expense, or are shown here to give incorrect results under some circumstances. This method gives the computational efficiency advantages of only needing two particle phase velocity fields whilst ensuring that a correct combination of weights and abscissas is returned for any arbitrary combination of particle trajectories without the need for any further assumptions. Particle crossing and wall bouncing with arbitrary combinations of angles are demonstrated using the method in a two-dimensional scheme. The ability of the scheme to include the presence of drag from a carrier phase is also demonstrated, as is bouncing off surfaces with inelastic collisions. The method is also applied to the Taylor-Green vortex flow test case and is found to give results superior to the existing two-node QMOM method and is in good agreement with results from Lagrangian modelling of this case.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
2013-01-01
Background Sentinel node biopsy often results in the identification and removal of multiple nodes as sentinel nodes, although most of these nodes could be non-sentinel nodes. This study investigated whether computed tomography-lymphography (CT-LG) can distinguish sentinel nodes from non-sentinel nodes and whether sentinel nodes identified by CT-LG can accurately stage the axilla in patients with breast cancer. Methods This study included 184 patients with breast cancer and clinically negative nodes. Contrast agent was injected interstitially. The location of sentinel nodes was marked on the skin surface using a CT laser light navigator system. Lymph nodes located just under the marks were first removed as sentinel nodes. Then, all dyed nodes or all hot nodes were removed. Results The mean number of sentinel nodes identified by CT-LG was significantly lower than that of dyed and/or hot nodes removed (1.1 vs 1.8, p <0.0001). Twenty-three (12.5%) patients had ≥2 sentinel nodes identified by CT-LG removed, whereas 94 (51.1%) of patients had ≥2 dyed and/or hot nodes removed (p <0.0001). Pathological evaluation demonstrated that 47 (25.5%) of 184 patients had metastasis to at least one node. All 47 patients demonstrated metastases to at least one of the sentinel nodes identified by CT-LG. Conclusions CT-LG can distinguish sentinel nodes from non-sentinel nodes, and sentinel nodes identified by CT-LG can accurately stage the axilla in patients with breast cancer. Successful identification of sentinel nodes using CT-LG may facilitate image-based diagnosis of metastasis, possibly leading to the omission of sentinel node biopsy. PMID:24321242
Remenschneider, Aaron K; Dilger, Amanda E; Wang, Yingbing; Palmer, Edwin L; Scott, James A; Emerick, Kevin S
2015-04-01
Preoperative localization of sentinel lymph nodes in head and neck cutaneous malignancies can be aided by single-photon emission computed tomography/computed tomography (SPECT/CT); however, its true predictive value for identifying lymph nodes intraoperatively remains unquantified. This study aims to understand the sensitivity, specificity, and positive and negative predictive values of SPECT/CT in sentinel lymph node biopsy for cutaneous malignancies of the head and neck. Blinded retrospective imaging review with comparison to intraoperative gamma probe confirmed sentinel lymph nodes. A consecutive series of patients with a head and neck cutaneous malignancy underwent preoperative SPECT/CT followed by sentinel lymph node biopsy with a gamma probe. Two nuclear medicine physicians, blinded to clinical data, independently reviewed each SPECT/CT. Activity within radiographically defined nodal basins was recorded and compared to intraoperative gamma probe findings. Sensitivity, specificity, and negative and positive predictive values were calculated with subgroup stratification by primary tumor site. Ninety-two imaging reads were performed on 47 patients with cutaneous malignancy who underwent SPECT/CT followed by sentinel lymph node biopsy. Overall sensitivity was 73%, specificity 92%, positive predictive value 54%, and negative predictive value 96%. The predictive ability of SPECT/CT to identify the basin or an adjacent basin containing the single hottest node was 92%. SPECT/CT overestimated uptake by an average of one nodal basin. In the head and neck, SPECT/CT has higher reliability for primary lesions of the eyelid, scalp, and cheek. SPECT/CT has high sensitivity, specificity, and negative predictive value, but may overestimate relevant nodal basins in sentinel lymph node biopsy. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.
OPEX: Optimized Eccentricity Computation in Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henderson, Keith
2011-11-14
Real-world graphs have many properties of interest, but often these properties are expensive to compute. We focus on eccentricity, radius and diameter in this work. These properties are useful measures of the global connectivity patterns in a graph. Unfortunately, computing eccentricity for all nodes is O(n2) for a graph with n nodes. We present OPEX, a novel combination of optimizations which improves computation time of these properties by orders of magnitude in real-world experiments on graphs of many different sizes. We run OPEX on graphs with up to millions of links. OPEX gives either exact results or bounded approximations, unlikemore » its competitors which give probabilistic approximations or sacrifice node-level information (eccentricity) to compute graphlevel information (diameter).« less
Effecting a broadcast with an allreduce operation on a parallel computer
Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.
2010-11-02
A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
Embedding global and collective in a torus network with message class map based tree path selection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Dong; Coteus, Paul W.; Eisley, Noel A.
Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computermore » program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.« less
Direct memory access transfer completion notification
Archer, Charles J.; Blocksome, Michael A.; Parker, Jeffrey J.
2010-08-17
Methods, apparatus, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA engine on an origin compute node in an injection FIFO buffer, a data descriptor for an application message to be transferred to a target compute node on behalf of an application on the origin compute node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying an address of a completion notification field in application storage for the application; transferring, by the origin DMA engine to the target compute node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that the transfer of the message is complete, including performing a local direct put operation to store predesignated notification data at the address of the completion notification field.
Majeski, Stephanie A; Steffey, Michele A; Fuller, Mark; Hunt, Geraldine B; Mayhew, Philipp D; Pollard, Rachel E
2017-05-01
Sentinel lymph node mapping can help to direct surgical oncologic staging and metastatic disease detection in patients with complex lymphatic pathways. We hypothesized that indirect computed tomographic lymphography (ICTL) with a water-soluble iodinated contrast agent would successfully map lymphatic pathways of the iliosacral lymphatic center in dogs with anal sac gland carcinoma, providing a potential preoperative method for iliosacral sentinel lymph node identification in dogs. Thirteen adult dogs diagnosed with anal sac gland carcinoma were enrolled in this prospective, pilot study, and ICTL was performed via peritumoral contrast injection with serial caudal abdominal computed tomography scans for iliosacral sentinel lymph node identification. Technical and descriptive details for ICTL were recorded, including patient positioning, total contrast injection volume, timing of contrast visualization, and sentinel lymph nodes and lymphatic pathways identified. Indirect CT lymphography identified lymphatic pathways and sentinel lymph nodes in 12/13 cases (92%). Identified sentinel lymph nodes were ipsilateral to the anal sac gland carcinoma in 8/12 and contralateral to the anal sac gland carcinoma in 4/12 cases. Sacral, internal iliac, and medial iliac lymph nodes were identified as sentinel lymph nodes, and patterns were widely variable. Patient positioning and timing of imaging may impact successful sentinel lymph node identification. Positioning in supported sternal recumbency is recommended. Results indicate that ICTL may be a feasible technique for sentinel lymph node identification in dogs with anal sac gland carcinoma and offer preliminary data to drive further investigation of iliosacral lymphatic metastatic patterns using ICTL and sentinel lymph node biopsy. © 2017 American College of Veterinary Radiology.
One-Dimensional Model for Mud Flows.
1985-10-01
law relation between the Chezy coefficient and the flow Reynolds number. Jeyapalan et al. [2], in their analysis of mine tailing dam failures...8217.. .: -:.. ; .r;./. : ... . :\\ :. . ... . RESULTS The model is compared with several dambreak experiments performed by Jeyapalan et al. [3]. In these...0.34 seconds per computational node. 5i Test 6 Test 2 Test 7 44 E 3 A2 Experimental Results0 Jeyapalan at al. (3) - C6- Numerical Results 4 8 12 i6 Time
Collectively loading programs in a multiple program multiple data environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
ALMA Correlator Real-Time Data Processor
NASA Astrophysics Data System (ADS)
Pisano, J.; Amestica, R.; Perez, J.
2005-10-01
The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
Indoor A* Pathfinding Through an Octree Representation of a Point Cloud
NASA Astrophysics Data System (ADS)
Rodenberg, O. B. P. M.; Verbree, E.; Zlatanova, S.
2016-10-01
There is a growing demand of 3D indoor pathfinding applications. Researched in the field of robotics during the last decades of the 20th century, these methods focussed on 2D navigation. Nowadays we would like to have the ability to help people navigate inside buildings or send a drone inside a building when this is too dangerous for people. What these examples have in common is that an object with a certain geometry needs to find an optimal collision free path between a start and goal point. This paper presents a new workflow for pathfinding through an octree representation of a point cloud. We applied the following steps: 1) the point cloud is processed so it fits best in an octree; 2) during the octree generation the interior empty nodes are filtered and further processed; 3) for each interior empty node the distance to the closest occupied node directly under it is computed; 4) a network graph is computed for all empty nodes; 5) the A* pathfinding algorithm is conducted. This workflow takes into account the connectivity for each node to all possible neighbours (face, edge and vertex and all sizes). Besides, a collision avoidance system is pre-processed in two steps: first, the clearance of each empty node is computed, and then the maximal crossing value between two empty neighbouring nodes is computed. The clearance is used to select interior empty nodes of appropriate size and the maximal crossing value is used to filter the network graph. Finally, both these datasets are used in A* pathfinding.
NASA Astrophysics Data System (ADS)
Reichenberger, Michael A.; Nichols, Daniel M.; Stevenson, Sarah R.; Swope, Tanner M.; Hilger, Caden W.; Unruh, Troy C.; McGregor, Douglas S.; Roberts, Jeremy A.
2017-08-01
Advancements in nuclear reactor core modeling and computational capability have encouraged further development of in-core neutron sensors. Micro-Pocket Fission Detectors (MPFDs) have been fabricated and tested previously, but successful testing of these prior detectors was limited to single-node operation with specialized designs. Described in this work is a modular, four-node MPFD array fabricated and tested at Kansas State University (KSU). The four sensor nodes were equally spaced to span the length of the fuel-region of the KSU TRIGA Mk. II research nuclear reactor core. The encapsulated array was filled with argon gas, serving as an ionization medium in the small cavities of the MPFDs. The unified design improved device ruggedness and simplified construction over previous designs. A 0.315-in. (8-mm) penetration in the upper grid plate of the KSU TRIGA Mk. II research nuclear reactor was used to deploy the array between fuel elements in the core. The MPFD array was coupled to an electronic support system which has been developed to support pulse-mode operation. Neutron-induced pulses were observed on all four sensor channels. Stable device operation was confirmed by testing under steady-state reactor conditions. Each of the four sensors in the array responded to changes in reactor power between 10 kWth and full power (750 kWth). Reactor power transients were observed in real-time including positive transients with periods of 5, 15, and 30 s. Finally, manual reactor power oscillations were observed in real-time.
Large scale cardiac modeling on the Blue Gene supercomputer.
Reumann, Matthias; Fitch, Blake G; Rayshubskiy, Aleksandr; Keller, David U; Weiss, Daniel L; Seemann, Gunnar; Dössel, Olaf; Pitman, Michael C; Rice, John J
2008-01-01
Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-10-29
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)
1997-01-01
We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
Exact and heuristic algorithms for Space Information Flow.
Uwitonze, Alfred; Huang, Jiaqing; Ye, Yuanqing; Cheng, Wenqing; Li, Zongpeng
2018-01-01
Space Information Flow (SIF) is a new promising research area that studies network coding in geometric space, such as Euclidean space. The design of algorithms that compute the optimal SIF solutions remains one of the key open problems in SIF. This work proposes the first exact SIF algorithm and a heuristic SIF algorithm that compute min-cost multicast network coding for N (N ≥ 3) given terminal nodes in 2-D Euclidean space. Furthermore, we find that the Butterfly network in Euclidean space is the second example besides the Pentagram network where SIF is strictly better than Euclidean Steiner minimal tree. The exact algorithm design is based on two key techniques: Delaunay triangulation and linear programming. Delaunay triangulation technique helps to find practically good candidate relay nodes, after which a min-cost multicast linear programming model is solved over the terminal nodes and the candidate relay nodes, to compute the optimal multicast network topology, including the optimal relay nodes selected by linear programming from all the candidate relay nodes and the flow rates on the connection links. The heuristic algorithm design is also based on Delaunay triangulation and linear programming techniques. The exact algorithm can achieve the optimal SIF solution with an exponential computational complexity, while the heuristic algorithm can achieve the sub-optimal SIF solution with a polynomial computational complexity. We prove the correctness of the exact SIF algorithm. The simulation results show the effectiveness of the heuristic SIF algorithm.
Dynamic Extension of a Virtualized Cluster by using Cloud Resources
NASA Astrophysics Data System (ADS)
Oberst, Oliver; Hauth, Thomas; Kernert, David; Riedel, Stephan; Quast, Günter
2012-12-01
The specific requirements concerning the software environment within the HEP community constrain the choice of resource providers for the outsourcing of computing infrastructure. The use of virtualization in HPC clusters and in the context of cloud resources is therefore a subject of recent developments in scientific computing. The dynamic virtualization of worker nodes in common batch systems provided by ViBatch serves each user with a dynamically virtualized subset of worker nodes on a local cluster. Now it can be transparently extended by the use of common open source cloud interfaces like OpenNebula or Eucalyptus, launching a subset of the virtual worker nodes within the cloud. This paper demonstrates how a dynamically virtualized computing cluster is combined with cloud resources by attaching remotely started virtual worker nodes to the local batch system.
Methods, apparatus and system for selective duplication of subtasks
Andrade Costa, Carlos H.; Cher, Chen-Yong; Park, Yoonho; Rosenburg, Bryan S.; Ryu, Kyung D.
2016-03-29
A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.
Extending the granularity of representation and control for the MIL-STD CAIS 1.0 node model
NASA Technical Reports Server (NTRS)
Rogers, Kathy L.
1986-01-01
The Common APSE (Ada 1 Program Support Environment) Interface Set (CAIS) (DoD85) node model provides an excellent baseline for interfaces in a single-host development environment. To encompass the entire spectrum of computing, however, the CAIS model should be extended in four areas. It should provide the interface between the engineering workstation and the host system throughout the entire lifecycle of the system. It should provide a basis for communication and integration functions needed by distributed host environments. It should provide common interfaces for communications mechanisms to and among target processors. It should provide facilities for integration, validation, and verification of test beds extending to distributed systems on geographically separate processors with heterogeneous instruction set architectures (ISAS). Additions to the PROCESS NODE model to extend the CAIS into these four areas are proposed.
A connectionist model for diagnostic problem solving
NASA Technical Reports Server (NTRS)
Peng, Yun; Reggia, James A.
1989-01-01
A competition-based connectionist model for solving diagnostic problems is described. The problems considered are computationally difficult in that (1) multiple disorders may occur simultaneously and (2) a global optimum in the space exponential to the total number of possible disorders is sought as a solution. The diagnostic problem is treated as a nonlinear optimization problem, and global optimization criteria are decomposed into local criteria governing node activation updating in the connectionist model. Nodes representing disorders compete with each other to account for each individual manifestation, yet complement each other to account for all manifestations through parallel node interactions. When equilibrium is reached, the network settles into a locally optimal state. Three randomly generated examples of diagnostic problems, each of which has 1024 cases, were tested, and the decomposition plus competition plus resettling approach yielded very high accuracy.
Runtime optimization of an application executing on a parallel computer
None
2014-11-25
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A; Smith, Brian E
2014-11-18
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A.; Smith, Brian E.
2013-01-29
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
A computer method for schedule processing and quick-time updating.
NASA Technical Reports Server (NTRS)
Mccoy, W. H.
1972-01-01
A schedule analysis program is presented which can be used to process any schedule with continuous flow and with no loops. Although generally thought of as a management tool, it has applicability to such extremes as music composition and computer program efficiency analysis. Other possibilities for its use include the determination of electrical power usage during some operation such as spacecraft checkout, and the determination of impact envelopes for the purpose of scheduling payloads in launch processing. At the core of the described computer method is an algorithm which computes the position of each activity bar on the output waterfall chart. The algorithm is basically a maximal-path computation which gives to each node in the schedule network the maximal path from the initial node to the given node.
Robust Routing Protocol For Digital Messages
NASA Technical Reports Server (NTRS)
Marvit, Maclen
1994-01-01
Refinement of ditigal-message-routing protocol increases fault tolerance of polled networks. AbNET-3 is latest of generic AbNET protocols for transmission of messages among computing nodes. AbNET concept described in "Multiple-Ring Digital Communication Network" (NPO-18133). Specifically aimed at increasing fault tolerance of network in broadcast mode, in which one node broadcasts message to and receives responses from all other nodes. Communication in network of computers maintained even when links fail.
Requesting Different Nodes Types When Submitting Jobs on the Peregrine
System | High-Performance Computing | NREL Requesting Different Nodes Types When Submitting Jobs on the Peregrine System Requesting Different Nodes Types When Submitting Jobs on the Peregrine
Study of hypervelocity meteoroid impact on orbital space stations
NASA Technical Reports Server (NTRS)
Leimbach, K. R.; Prozan, R. J.
1973-01-01
Structural damage resulting in hypervelocity impact of a meteorite on a spacecraft is discussed. Of particular interest is the backside spallation caused by such a collision. To treat this phenomenon two numerical schemes were developed in the course of this study to compute the elastic-plastic flow fracture of a solid. The numerical schemes are a five-point finite difference scheme and a four-node finite element scheme. The four-node finite element scheme proved to be less sensitive to the type of boundary conditions and loadings. Although further development work is needed to improve the program versatility (generalization of the network topology, secondary storage for large systems, improving of the coding to reduce the run time, etc.), the basic framework is provided for a utilitarian computer program which may be used in a wide variety of situations. Analytic results showing the program output are given for several test cases.
NASA Astrophysics Data System (ADS)
Lanciotti, E.; Merino, G.; Bria, A.; Blomer, J.
2011-12-01
In a distributed computing model as WLCG the software of experiment specific application software has to be efficiently distributed to any site of the Grid. Application software is currently installed in a shared area of the site visible for all Worker Nodes (WNs) of the site through some protocol (NFS, AFS or other). The software is installed at the site by jobs which run on a privileged node of the computing farm where the shared area is mounted in write mode. This model presents several drawbacks which cause a non-negligible rate of job failure. An alternative model for software distribution based on the CERN Virtual Machine File System (CernVM-FS) has been tried at PIC, the Spanish Tierl site of WLCG. The test bed used and the results are presented in this paper.
Lustre Distributed Name Space (DNE) Evaluation at the Oak Ridge Leadership Computing Facility (OLCF)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmons, James S.; Leverman, Dustin B.; Hanley, Jesse A.
This document describes the Lustre Distributed Name Space (DNE) evaluation carried at the Oak Ridge Leadership Computing Facility (OLCF) between 2014 and 2015. DNE is a development project funded by the OpenSFS, to improve Lustre metadata performance and scalability. The development effort has been split into two parts, the first part (DNE P1) providing support for remote directories over remote Lustre Metadata Server (MDS) nodes and Metadata Target (MDT) devices, while the second phase (DNE P2) addressed split directories over multiple remote MDS nodes and MDT devices. The OLCF have been actively evaluating the performance, reliability, and the functionality ofmore » both DNE phases. For these tests, internal OLCF testbed were used. Results are promising and OLCF is planning on a full DNE deployment by mid-2016 timeframe on production systems.« less
Using SRAM Based FPGAs for Power-Aware High Performance Wireless Sensor Networks
Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa
2012-01-01
While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today’s applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements. PMID:22736971
Using SRAM based FPGAs for power-aware high performance wireless sensor networks.
Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa
2012-01-01
While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today's applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements.
Lützen, Ulf; Naumann, Carsten Maik; Marx, Marlies; Zhao, Yi; Jüptner, Michael; Baumann, René; Papp, László; Zsótér, Norbert; Aksenov, Alexey; Jünemann, Klaus-Peter; Zuhayra, Maaz
2016-09-07
Because of the increasing importance of computer-assisted post processing of image data in modern medical diagnostic we studied the value of an algorithm for assessment of single photon emission computed tomography/computed tomography (SPECT/CT)-data, which has been used for the first time for lymph node staging in penile cancer with non-palpable inguinal lymph nodes. In the guidelines of the relevant international expert societies, sentinel lymph node-biopsy (SLNB) is recommended as a diagnostic method of choice. The aim of this study is to evaluate the value of the afore-mentioned algorithm and in the clinical context the reliability and the associated morbidity of this procedure. Between 2008 and 2015, 25 patients with invasive penile cancer and inconspicuous inguinal lymph node status underwent SLNB after application of the radiotracer Tc-99m labelled nanocolloid. We recorded in a prospective approach the reliability and the complication rate of the procedure. In addition, we evaluated the results of an algorithm for SPECT/CT-data assessment of these patients. SLNB was carried out in 44 groins of 25 patients. In three patients, inguinal lymph node metastases were detected via SLNB. In one patient, bilateral lymph node recurrence of the groins occurred after negative SLNB. There was a false-negative rate of 4 % in relation to the number of patients (1/25), resp. 4.5 % in relation to the number of groins (2/44). Morbidity was 4 % in relation to the number of patients (1/25), resp. 2.3 % in relation to the number of groins (1/44). The results of computer-assisted assessment of SPECT/CT data for sentinel lymph node (SLN)-diagnostics demonstrated high sensitivity of 88.8 % and specificity of 86.7 %. SLNB is a very reliable method, associated with low morbidity. Computer-assisted assessment of SPECT/CT data of the SLN-diagnostics shows high sensitivity and specificity. While it cannot replace the assessment by medical experts, it can still provide substantial supplement and assistance.
Laboratory for Computer Science Progress Report 16, 1 July 1978 - 30 June 1979,
1980-08-01
name strongly distinguishes the XLMS node from ordinary nameless semantic network nodes. The name of a node has two parts: the " genus ", itself a node...and the "specializer", a node or an atomic symbol. The genus and specializer of a node are almost always semantically meaningful, though their...meaning is almost never suppliec by XLMS, but rather by some system built on top of XLMS. The genus of a node almost always plays a crucial role in its
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support
Camargo, João; Rochol, Juergen; Gerla, Mario
2018-01-01
A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends. PMID:29364172
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support.
Rosário, Denis; Schimuneck, Matias; Camargo, João; Nobre, Jéferson; Both, Cristiano; Rochol, Juergen; Gerla, Mario
2018-01-24
A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends.
Adaptive Wavelet Coding Applied in a Wireless Control System.
Gama, Felipe O S; Silveira, Luiz F Q; Salazar, Andrés O
2017-12-13
Wireless control systems can sense, control and act on the information exchanged between the wireless sensor nodes in a control loop. However, the exchanged information becomes susceptible to the degenerative effects produced by the multipath propagation. In order to minimize the destructive effects characteristic of wireless channels, several techniques have been investigated recently. Among them, wavelet coding is a good alternative for wireless communications for its robustness to the effects of multipath and its low computational complexity. This work proposes an adaptive wavelet coding whose parameters of code rate and signal constellation can vary according to the fading level and evaluates the use of this transmission system in a control loop implemented by wireless sensor nodes. The performance of the adaptive system was evaluated in terms of bit error rate (BER) versus E b / N 0 and spectral efficiency, considering a time-varying channel with flat Rayleigh fading, and in terms of processing overhead on a control system with wireless communication. The results obtained through computational simulations and experimental tests show performance gains obtained by insertion of the adaptive wavelet coding in a control loop with nodes interconnected by wireless link. These results enable the use of this technique in a wireless link control loop.
Recent Performance Results of VPIC on Trinity
NASA Astrophysics Data System (ADS)
Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Le, A.; Li, H.; Nam, H.; Pang, X.; Stark, D. J.; Rust, W. N., III; Yin, L.; Albright, B. J.
2017-10-01
Trinity is a new DOE compute resource now in production at Los Alamos National Laboratory. Trinity has several new and unique features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes, use of on package high bandwidth memory (HBM) for KNL nodes, ability to configure KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to optimize VPIC on Trinity by taking advantage of these new architectural features. Results of work will be presented on performance of VPIC on Haswell and KNL partitions for single node runs and runs at scale. Results include use of burst buffers at scale to optimize I/O, comparison of strategies for using MPI and threads, performance benefits using HBM and effectiveness of using intrinsics for vectorization. Work performed under auspices of U.S. Dept. of Energy by Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by LANL LDRD program.
Performance of the Widely-Used CFD Code OVERFLOW on the Pleides Supercomputer
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.
2017-01-01
Computational performance studies were made for NASA's widely used Computational Fluid Dynamics code OVERFLOW on the Pleiades Supercomputer. Two test cases were considered: a full launch vehicle with a grid of 286 million points and a full rotorcraft model with a grid of 614 million points. Computations using up to 8000 cores were run on Sandy Bridge and Ivy Bridge nodes. Performance was monitored using times reported in the day files from the Portable Batch System utility. Results for two grid topologies are presented and compared in detail. Observations and suggestions for future work are made.
Functional Equivalence Acceptance Testing of FUN3D for Entry Descent and Landing Applications
NASA Technical Reports Server (NTRS)
Gnoffo, Peter A.; Wood, William A.; Kleb, William L.; Alter, Stephen J.; Glass, Christopher E.; Padilla, Jose F.; Hammond, Dana P.; White, Jeffery A.
2013-01-01
The functional equivalence of the unstructured grid code FUN3D to the the structured grid code LAURA (Langley Aerothermodynamic Upwind Relaxation Algorithm) is documented for applications of interest to the Entry, Descent, and Landing (EDL) community. Examples from an existing suite of regression tests are used to demonstrate the functional equivalence, encompassing various thermochemical models and vehicle configurations. Algorithm modifications required for the node-based unstructured grid code (FUN3D) to reproduce functionality of the cell-centered structured code (LAURA) are also documented. Challenges associated with computation on tetrahedral grids versus computation on structured-grid derived hexahedral systems are discussed.
Direct memory access transfer completion notification
Chen, Dong; Giampapa, Mark E.; Heidelberger, Philip; Kumar, Sameer; Parker, Jeffrey J.; Steinmacher-Burow, Burkhard D.; Vranas, Pavlos
2010-07-27
Methods, compute nodes, and computer program products are provided for direct memory access (`DMA`) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (`FIFO`) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
NASA Astrophysics Data System (ADS)
Niño, Alfonso; Muñoz-Caro, Camelia; Reyes, Sebastián
2015-11-01
The last decade witnessed a great development of the structural and dynamic study of complex systems described as a network of elements. Therefore, systems can be described as a set of, possibly, heterogeneous entities or agents (the network nodes) interacting in, possibly, different ways (defining the network edges). In this context, it is of practical interest to model and handle not only static and homogeneous networks but also dynamic, heterogeneous ones. Depending on the size and type of the problem, these networks may require different computational approaches involving sequential, parallel or distributed systems with or without the use of disk-based data structures. In this work, we develop an Application Programming Interface (APINetworks) for the modeling and treatment of general networks in arbitrary computational environments. To minimize dependency between components, we decouple the network structure from its function using different packages for grouping sets of related tasks. The structural package, the one in charge of building and handling the network structure, is the core element of the system. In this work, we focus in this API structural component. We apply an object-oriented approach that makes use of inheritance and polymorphism. In this way, we can model static and dynamic networks with heterogeneous elements in the nodes and heterogeneous interactions in the edges. In addition, this approach permits a unified treatment of different computational environments. Tests performed on a C++11 version of the structural package show that, on current standard computers, the system can handle, in main memory, directed and undirected linear networks formed by tens of millions of nodes and edges. Our results compare favorably to those of existing tools.
Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce
Pratx, Guillem; Xing, Lei
2011-01-01
Monte Carlo simulation is considered the most reliable method for modeling photon migration in heterogeneous media. However, its widespread use is hindered by the high computational cost. The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment. We ported the MC321 Monte Carlo package to Hadoop, an open-source MapReduce framework. In this implementation, Map tasks compute photon histories in parallel while a Reduce task scores photon absorption. The distributed implementation was evaluated on a commercial compute cloud. The simulation time was found to be linearly dependent on the number of photons and inversely proportional to the number of nodes. For a cluster size of 240 nodes, the simulation of 100 billion photon histories took 22 min, a 1258 × speed-up compared to the single-threaded Monte Carlo program. The overall computational throughput was 85,178 photon histories per node per second, with a latency of 100 s. The distributed simulation produced the same output as the original implementation and was resilient to hardware failure: the correctness of the simulation was unaffected by the shutdown of 50% of the nodes. PMID:22191916
Dispatching packets on a global combining network of a parallel computer
Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN
2011-07-19
Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Peregrine System | High-Performance Computing | NREL
) and longer-term (/projects) storage. These file systems are mounted on all nodes. Peregrine has three -2670 Xeon processors and 64 GB of memory. In addition to mounting the /home, /nopt, /projects and # cores/node Memory/node Peak (DP) performance per node 88 Intel Xeon E5-2670 "Sandy Bridge" 8
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-04-27
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
Research on a Banknote Printing Wastewater Monitoring System based on Wireless Sensor Network
NASA Astrophysics Data System (ADS)
Li, B. B.; Yuan, Z. F.
2006-10-01
In this paper, a banknote printing wastewater monitoring system based on WSN is presented in line with the system demands and actual condition of the worksite for a banknote printing factory. In Physical Layer, the network node is a nRF9e5-centric embedded instrument, which can realize the multi-function such as data collecting, status monitoring, wireless data transmission and so on. Limited by the computing capability, memory capability, communicating energy and others factors, it is impossible for the node to get every detail information of the network, so the communication protocol on WSN couldn't be very complicated. The competitive-based MACA (Multiple Access with Collision Avoidance) Protocol is introduced in MAC, which can decide the communication process and working mode of the nodes, avoid the collision of data transmission, hidden and exposed station problem of nodes. On networks layer, the routing protocol in charge of the transmitting path of the data, the networks topology structure is arranged based on address assignation. Accompanied with some redundant nodes, the network performances stabile and expandable. The wastewater monitoring system is a tentative practice of WSN theory in engineering. Now, the system has passed test and proved efficiently.
Global interrupt and barrier networks
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.
2008-10-28
A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
A spread willingness computing-based information dissemination model.
Huang, Haojing; Cui, Zhiming; Zhang, Shukui
2014-01-01
This paper constructs a kind of spread willingness computing based on information dissemination model for social network. The model takes into account the impact of node degree and dissemination mechanism, combined with the complex network theory and dynamics of infectious diseases, and further establishes the dynamical evolution equations. Equations characterize the evolutionary relationship between different types of nodes with time. The spread willingness computing contains three factors which have impact on user's spread behavior: strength of the relationship between the nodes, views identity, and frequency of contact. Simulation results show that different degrees of nodes show the same trend in the network, and even if the degree of node is very small, there is likelihood of a large area of information dissemination. The weaker the relationship between nodes, the higher probability of views selection and the higher the frequency of contact with information so that information spreads rapidly and leads to a wide range of dissemination. As the dissemination probability and immune probability change, the speed of information dissemination is also changing accordingly. The studies meet social networking features and can help to master the behavior of users and understand and analyze characteristics of information dissemination in social network.
A Spread Willingness Computing-Based Information Dissemination Model
Cui, Zhiming; Zhang, Shukui
2014-01-01
This paper constructs a kind of spread willingness computing based on information dissemination model for social network. The model takes into account the impact of node degree and dissemination mechanism, combined with the complex network theory and dynamics of infectious diseases, and further establishes the dynamical evolution equations. Equations characterize the evolutionary relationship between different types of nodes with time. The spread willingness computing contains three factors which have impact on user's spread behavior: strength of the relationship between the nodes, views identity, and frequency of contact. Simulation results show that different degrees of nodes show the same trend in the network, and even if the degree of node is very small, there is likelihood of a large area of information dissemination. The weaker the relationship between nodes, the higher probability of views selection and the higher the frequency of contact with information so that information spreads rapidly and leads to a wide range of dissemination. As the dissemination probability and immune probability change, the speed of information dissemination is also changing accordingly. The studies meet social networking features and can help to master the behavior of users and understand and analyze characteristics of information dissemination in social network. PMID:25110738
Shared address collectives using counter mechanisms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blocksome, Michael; Dozsa, Gabor; Gooding, Thomas M
A shared address space on a compute node stores data received from a network and data to transmit to the network. The shared address space includes an application buffer that can be directly operated upon by a plurality of processes, for instance, running on different cores on the compute node. A shared counter is used for one or more of signaling arrival of the data across the plurality of processes running on the compute node, signaling completion of an operation performed by one or more of the plurality of processes, obtaining reservation slots by one or more of the pluralitymore » of processes, or combinations thereof.« less
Matsuzawa, Fumihiko; Omoto, Kiyoka; Einama, Takahiro; Abe, Hironori; Suzuki, Takashi; Hamaguchi, Jun; Kaga, Terumi; Sato, Mami; Oomura, Masako; Takata, Yumiko; Fujibe, Ayako; Takeda, Chie; Tamura, Etsuya; Taketomi, Akinobu; Kyuno, Kenichi
2015-01-01
Breast cancer is the most common type of cancer in women. The 5-year survival rate in patients with breast cancer ranges from 74 to 82 %. Sentinel lymph node biopsy has become an alternative to axillary lymph node dissection for nodal staging. We evaluated the detection of the sentinel lymph node and metastasis of the lymph node using contrast enhanced ultrasonography with Sonazoid. Between December 2013 and May 2014, 32 patients with operable breast cancer were enrolled in this study. We evaluated the detection of axillary sentinel lymph nodes and the evaluation of axillary lymph nodes metastasis using contrast enhanced computed tomography, color Doppler ultrasonography and contrast enhanced ultrasonography with Sonazoid. All the sentinel lymph nodes were identified, and the sentinel lymph nodes detected by contrast enhanced ultrasonography with Sonazoid corresponded with those detected by computed tomography lymphography and indigo carmine method. The detection of metastasis based on contrast enhanced computed tomography were sensitivity 20.0 %, specificity 88.2 %, PPV 60.0 %, NPV 55.6 %, accuracy 56.3 %. Based on color Doppler ultrasonography, the results were sensitivity 36.4 %, specificity 95.2 %, PPV 80.0 %, NPV 74.1 %, accuracy 75.0 %. Based on contrast enhanced ultrasonography with Sonazoid, the results were sensitivity 81.8 %, specificity 95.2 %, PPV 90.0 %, NPV 90.9 %, accuracy 90.6 %. The results suggested that contrast enhanced ultrasonography with Sonazoid was the most accurate among the evaluations of these modalities. In the future, we believe that our method would take the place of conventional sentinel lymph node biopsy for an axillary staging method.
The Mark III Hypercube-Ensemble Computers
NASA Technical Reports Server (NTRS)
Peterson, John C.; Tuazon, Jesus O.; Lieberman, Don; Pniel, Moshe
1988-01-01
Mark III Hypercube concept applied in development of series of increasingly powerful computers. Processor of each node of Mark III Hypercube ensemble is specialized computer containing three subprocessors and shared main memory. Solves problem quickly by simultaneously processing part of problem at each such node and passing combined results to host computer. Disciplines benefitting from speed and memory capacity include astrophysics, geophysics, chemistry, weather, high-energy physics, applied mechanics, image processing, oil exploration, aircraft design, and microcircuit design.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-23
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Peer-to-peer architectures for exascale computing : LDRD final report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vorobeychik, Yevgeniy; Mayo, Jackson R.; Minnich, Ronald G.
2010-09-01
The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms ({approx} 10{sup 18} operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated as ubiquitousmore » and their effects are considered as simply another controllable source of error in a scientific computation, can remove such obstacles to exascale computing for certain applications. We have developed a simulation framework, as well as a preliminary implementation in a large-scale emulation environment, for exploration of these 'fault-oblivious computing' approaches. High-performance computing (HPC) faces a fundamental problem of increasing total component failure rates due to increasing system sizes, which threaten to degrade system reliability to an unusable level by the time the exascale range is reached ({approx} 10{sup 18} operations per second, requiring of order millions of processors). As computer scientists seek a way to scale system software for next-generation exascale machines, it is worth considering peer-to-peer (P2P) architectures that are already capable of supporting 10{sup 6}-10{sup 7} unreliable nodes. Exascale platforms will require a different way of looking at systems and software because the machine will likely not be available in its entirety for a meaningful execution time. Realistic estimates of failure rates range from a few times per day to more than once per hour for these platforms. P2P architectures give us a starting point for crafting applications and system software for exascale. In the context of the Internet, P2P applications (e.g., file sharing, botnets) have already solved this problem for 10{sup 6}-10{sup 7} nodes. Usually based on a fractal distributed hash table structure, these systems have proven robust in practice to constant and unpredictable outages, failures, and even subversion. For example, a recent estimate of botnet turnover (i.e., the number of machines leaving and joining) is about 11% per week. Nonetheless, P2P networks remain effective despite these failures: The Conficker botnet has grown to {approx} 5 x 10{sup 6} peers. Unlike today's system software and applications, those for next-generation exascale machines cannot assume a static structure and, to be scalable over millions of nodes, must be decentralized. P2P architectures achieve both, and provide a promising model for 'fault-oblivious computing'. This project aimed to study the dynamics of P2P networks in the context of a design for exascale systems and applications. Having no single point of failure, the most successful P2P architectures are adaptive and self-organizing. While there has been some previous work applying P2P to message passing, little attention has been previously paid to the tightly coupled exascale domain. Typically, the per-node footprint of P2P systems is small, making them ideal for HPC use. The implementation on each peer node cooperates en masse to 'heal' disruptions rather than relying on a controlling 'master' node. Understanding this cooperative behavior from a complex systems viewpoint is essential to predicting useful environments for the inextricably unreliable exascale platforms of the future. We sought to obtain theoretical insight into the stability and large-scale behavior of candidate architectures, and to work toward leveraging Sandia's Emulytics platform to test promising candidates in a realistic (ultimately {ge} 10{sup 7} nodes) setting. Our primary example applications are drawn from linear algebra: a Jacobi relaxation solver for the heat equation, and the closely related technique of value iteration in optimization. We aimed to apply P2P concepts in designing implementations capable of surviving an unreliable machine of 10{sup 6} nodes.« less
Performance of VPIC on Sequoia
NASA Astrophysics Data System (ADS)
Nystrom, William
2014-10-01
Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
Single-node orbit analsyis with radiation heat transfer only
NASA Technical Reports Server (NTRS)
Peoples, J. A.
1977-01-01
The steady-state temperature of a single node which dissipates energy by radiation only is discussed for a nontime varying thermal environment. Relationships are developed to illustrate how shields can be utilized to represent a louver system. A computer program is presented which can assess periodic temperature characteristics of a single node in a time varying thermal environment having energy dissipation by radiation only. The computer program performs thermal orbital analysis for five combinations of plate, shields, and louvers.
NASA Astrophysics Data System (ADS)
Xing, Fangyuan; Wang, Honghuan; Yin, Hongxi; Li, Ming; Luo, Shenzi; Wu, Chenguang
2016-02-01
With the extensive application of cloud computing and data centres, as well as the constantly emerging services, the big data with the burst characteristic has brought huge challenges to optical networks. Consequently, the software defined optical network (SDON) that combines optical networks with software defined network (SDN), has attracted much attention. In this paper, an OpenFlow-enabled optical node employed in optical cross-connect (OXC) and reconfigurable optical add/drop multiplexer (ROADM), is proposed. An open source OpenFlow controller is extended on routing strategies. In addition, the experiment platform based on OpenFlow protocol for software defined optical network, is designed. The feasibility and availability of the OpenFlow-enabled optical nodes and the extended OpenFlow controller are validated by the connectivity test, protection switching and load balancing experiments in this test platform.
Testing SLURM open source batch system for a Tierl/Tier2 HEP computing facility
NASA Astrophysics Data System (ADS)
Donvito, Giacinto; Salomoni, Davide; Italiano, Alessandro
2014-06-01
In this work the testing activities that were carried on to verify if the SLURM batch system could be used as the production batch system of a typical Tier1/Tier2 HEP computing center are shown. SLURM (Simple Linux Utility for Resource Management) is an Open Source batch system developed mainly by the Lawrence Livermore National Laboratory, SchedMD, Linux NetworX, Hewlett-Packard, and Groupe Bull. Testing was focused both on verifying the functionalities of the batch system and the performance that SLURM is able to offer. We first describe our initial set of requirements. Functionally, we started configuring SLURM so that it replicates all the scheduling policies already used in production in the computing centers involved in the test, i.e. INFN-Bari and the INFN-Tier1 at CNAF, Bologna. Currently, the INFN-Tier1 is using IBM LSF (Load Sharing Facility), while INFN-Bari, an LHC Tier2 for both CMS and Alice, is using Torque as resource manager and MAUI as scheduler. We show how we configured SLURM in order to enable several scheduling functionalities such as Hierarchical FairShare, Quality of Service, user-based and group-based priority, limits on the number of jobs per user/group/queue, job age scheduling, job size scheduling, and scheduling of consumable resources. We then show how different job typologies, like serial, MPI, multi-thread, whole-node and interactive jobs can be managed. Tests on the use of ACLs on queues or in general other resources are then described. A peculiar SLURM feature we also verified is triggers on event, useful to configure specific actions on each possible event in the batch system. We also tested highly available configurations for the master node. This feature is of paramount importance since a mandatory requirement in our scenarios is to have a working farm cluster even in case of hardware failure of the server(s) hosting the batch system. Among our requirements there is also the possibility to deal with pre-execution and post-execution scripts, and controlled handling of the failure of such scripts. This feature is heavily used, for example, at the INFN-Tier1 in order to check the health status of a worker node before execution of each job. Pre- and post-execution scripts are also important to let WNoDeS, the IaaS Cloud solution developed at INFN, use SLURM as its resource manager. WNoDeS has already been supporting the LSF and Torque batch systems for some time; in this work we show the work done so that WNoDeS supports SLURM as well. Finally, we show several performance tests that we carried on to verify SLURM scalability and reliability, detailing scalability tests both in terms of managed nodes and of queued jobs.
Fog computing job scheduling optimization based on bees swarm
NASA Astrophysics Data System (ADS)
Bitam, Salim; Zeadally, Sherali; Mellouk, Abdelhamid
2018-04-01
Fog computing is a new computing architecture, composed of a set of near-user edge devices called fog nodes, which collaborate together in order to perform computational services such as running applications, storing an important amount of data, and transmitting messages. Fog computing extends cloud computing by deploying digital resources at the premise of mobile users. In this new paradigm, management and operating functions, such as job scheduling aim at providing high-performance, cost-effective services requested by mobile users and executed by fog nodes. We propose a new bio-inspired optimization approach called Bees Life Algorithm (BLA) aimed at addressing the job scheduling problem in the fog computing environment. Our proposed approach is based on the optimized distribution of a set of tasks among all the fog computing nodes. The objective is to find an optimal tradeoff between CPU execution time and allocated memory required by fog computing services established by mobile users. Our empirical performance evaluation results demonstrate that the proposal outperforms the traditional particle swarm optimization and genetic algorithm in terms of CPU execution time and allocated memory.
Colony-Stimulating Factors for Febrile Neutropenia during Cancer Therapy
Bennett, Charles L.; Djulbegovic, Benjamin; Norris, LeAnn B.; Armitage, James O.
2014-01-01
A 55-year-old, previously healthy woman received a diagnosis of diffuse large-B-cell lymphoma after the evaluation of an enlarged left axillary lymph node obtained on biopsy. She had been asymptomatic except for the presence of enlarged axillary lymph nodes, which she had found while bathing. She was referred to an oncologist, who performed a staging evaluation. A complete blood count and test results for liver and renal function and serum lactate dehydrogenase were normal. Positron-emission tomography and computed tomography (PET–CT) identified enlarged lymph nodes with abnormal uptake in the left axilla, mediastinum, and retroperitoneum. Results on bone marrow biopsy were normal. The patient’s oncologist recommends treatment with six cycles of cyclophosphamide, doxorubicin, vincristine, and prednisone with rituximab (CHOP-R) at 21-day intervals. Is the administration of prophylactic granulocyte colony-stimulating factor (G-CSF) with the first cycle of chemotherapy indicated? PMID:23514290
ROS-IGTL-Bridge: an open network interface for image-guided therapy using the ROS environment.
Frank, Tobias; Krieger, Axel; Leonard, Simon; Patel, Niravkumar A; Tokuda, Junichi
2017-08-01
With the growing interest in advanced image-guidance for surgical robot systems, rapid integration and testing of robotic devices and medical image computing software are becoming essential in the research and development. Maximizing the use of existing engineering resources built on widely accepted platforms in different fields, such as robot operating system (ROS) in robotics and 3D Slicer in medical image computing could simplify these tasks. We propose a new open network bridge interface integrated in ROS to ensure seamless cross-platform data sharing. A ROS node named ROS-IGTL-Bridge was implemented. It establishes a TCP/IP network connection between the ROS environment and external medical image computing software using the OpenIGTLink protocol. The node exports ROS messages to the external software over the network and vice versa simultaneously, allowing seamless and transparent data sharing between the ROS-based devices and the medical image computing platforms. Performance tests demonstrated that the bridge could stream transforms, strings, points, and images at 30 fps in both directions successfully. The data transfer latency was <1.2 ms for transforms, strings and points, and 25.2 ms for color VGA images. A separate test also demonstrated that the bridge could achieve 900 fps for transforms. Additionally, the bridge was demonstrated in two representative systems: a mock image-guided surgical robot setup consisting of 3D slicer, and Lego Mindstorms with ROS as a prototyping and educational platform for IGT research; and the smart tissue autonomous robot surgical setup with 3D Slicer. The study demonstrated that the bridge enabled cross-platform data sharing between ROS and medical image computing software. This will allow rapid and seamless integration of advanced image-based planning/navigation offered by the medical image computing software such as 3D Slicer into ROS-based surgical robot systems.
Broadcasting Topology and Routing Information in Computer Networks
1985-05-01
DOWN\\ linki inki FIgwre 1.2.1: Topology Problem Example messages from node 2 before receiving the first DOWN message from node 3. Now assume that before...node to each of the link’s end nodes. 54 link.1 cc 4 1 -. distances to linki Figue 3.4.2: SPTA Port Distance Table Example An example of these
Modeling node bandwidth limits and their effects on vector combining algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Littlefield, R.J.
Each node in a message-passing multicomputer typically has several communication links. However, the maximum aggregate communication speed of a node is often less than the sum of its individual link speeds. Such computers are called node bandwidth limited (NBL). The NBL constraint is important when choosing algorithms because it can change the relative performance of different algorithms that accomplish the same task. This paper introduces a model of communication performance for NBL computers and uses the model to analyze the overall performance of three algorithms for vector combining (global sum) on the Intel Touchstone DELTA computer. Each of the threemore » algorithms is found to be at least 33% faster than the other two for some combinations of machine size and vector length. The NBL constraint is shown to significantly affect the conditions under which each algorithm is fastest.« less
Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem
NASA Astrophysics Data System (ADS)
Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa
A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.
Localization Algorithm Based on a Spring Model (LASM) for Large Scale Wireless Sensor Networks.
Chen, Wanming; Mei, Tao; Meng, Max Q-H; Liang, Huawei; Liu, Yumei; Li, Yangming; Li, Shuai
2008-03-15
A navigation method for a lunar rover based on large scale wireless sensornetworks is proposed. To obtain high navigation accuracy and large exploration area, highnode localization accuracy and large network scale are required. However, thecomputational and communication complexity and time consumption are greatly increasedwith the increase of the network scales. A localization algorithm based on a spring model(LASM) method is proposed to reduce the computational complexity, while maintainingthe localization accuracy in large scale sensor networks. The algorithm simulates thedynamics of physical spring system to estimate the positions of nodes. The sensor nodesare set as particles with masses and connected with neighbor nodes by virtual springs. Thevirtual springs will force the particles move to the original positions, the node positionscorrespondingly, from the randomly set positions. Therefore, a blind node position can bedetermined from the LASM algorithm by calculating the related forces with the neighbornodes. The computational and communication complexity are O(1) for each node, since thenumber of the neighbor nodes does not increase proportionally with the network scale size.Three patches are proposed to avoid local optimization, kick out bad nodes and deal withnode variation. Simulation results show that the computational and communicationcomplexity are almost constant despite of the increase of the network scale size. The time consumption has also been proven to remain almost constant since the calculation steps arealmost unrelated with the network scale size.
The Case for Modular Redundancy in Large-Scale High Performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Ong, Hong Hoe; Scott, Stephen L
2009-01-01
Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to failure (MTTF) and a higher mean-time to recover (MTTR) than their predecessors. Modular redundancy is being used in many mission critical systems today to provide for resilience, such as for aerospace and command \\& control systems. The primary argument against modular redundancy for resilience in HPC has always been that the capability of a HPC system, and respective return on investment, would be significantly reduced. We argue that modular redundancy can significantly increasemore » compute node availability as it removes the impact of scale from single compute node MTTR. We further argue that single compute nodes can be much less reliable, and therefore less expensive, and still be highly available, if their MTTR/MTTF ratio is maintained.« less
NASA Technical Reports Server (NTRS)
Majumdar, Alok; Valenzuela, Juan; LeClair, Andre; Moder, Jeff
2015-01-01
This paper presents a numerical model of a system-level test bed - the multipurpose hydrogen test bed (MHTB) using Generalized Fluid System Simulation Program (GFSSP). MHTB is representative in size and shape of a fully integrated space transportation vehicle liquid hydrogen (LH2) propellant tank and was tested at Marshall Space Flight Center (MSFC) to generate data for cryogenic storage. GFSSP is a finite volume based network flow analysis software developed at MSFC and used for thermo-fluid analysis of propulsion systems. GFSSP has been used to model the self-pressurization and ullage pressure control by Thermodynamic Vent System (TVS). A TVS typically includes a Joule-Thompson (J-T) expansion device, a two-phase heat exchanger, and a mixing pump and spray to extract thermal energy from the tank without significant loss of liquid propellant. Two GFSSP models (Self-Pressurization & TVS) were separately developed and tested and then integrated to simulate the entire system. Self-Pressurization model consists of multiple ullage nodes, propellant node and solid nodes; it computes the heat transfer through Multi-Layer Insulation blankets and calculates heat and mass transfer between ullage and liquid propellant and ullage and tank wall. TVS model calculates the flow through J-T valve, heat exchanger and spray and vent systems. Two models are integrated by exchanging data through User Subroutines of both models. The integrated models results have been compared with MHTB test data of 50% fill level. Satisfactory comparison was observed between test and numerical predictions.
Xu, Shuhang; Feng, Lingling; Chen, Yongming; Sun, Ying; Lu, Yao; Huang, Shaomin; Fu, Yang; Zheng, Rongqin; Zhang, Yujing; Zhang, Rong
2017-06-20
In order to refine the location and metastasis-risk density of 16 lymph node stations of gastric cancer for neoadjuvant radiotherapy, we retrospectively reviewed the initial images and pathological reports of 255 gastric cancer patients with lymphatic metastasis. Metastatic lymph nodes identified in the initial computed tomography images were investigated by two radiologists with gastrointestinal specialty. A circle with a diameter of 5 mm was used to identify the central position of each metastatic lymph node, defined as the LNc (the central position of the lymph node). The LNc was drawn at the equivalent location on the reference images of a standard patient based on the relative distances to the same reference vessels and the gastric wall using a Monaco® version 5.0 workstation. The image manipulation software Medi-capture was programmed for image analysis to produce a contour and density atlas of 16 lymph node stations. Based on a total of 2846 LNcs contoured (31-599 per lymph node station), we created a density distribution map of 16 lymph node drainage stations of the stomach on computed tomography images, showing the detailed radiographic delineation of each lymph node station as well as high-risk areas for lymph node metastasis. Our mapping can serve as a template for the delineation of gastric lymph node stations when defining clinical target volume in pre-operative radiotherapy for gastric cancer.
Two-dimensional nonsteady viscous flow simulation on the Navier-Stokes computer miniNode
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M.; Littman, Michael G.; Flannery, William
1986-01-01
The needs of large-scale scientific computation are outpacing the growth in performance of mainframe supercomputers. In particular, problems in fluid mechanics involving complex flow simulations require far more speed and capacity than that provided by current and proposed Class VI supercomputers. To address this concern, the Navier-Stokes Computer (NSC) was developed. The NSC is a parallel-processing machine, comprised of individual Nodes, each comparable in performance to current supercomputers. The global architecture is that of a hypercube, and a 128-Node NSC has been designed. New architectural features, such as a reconfigurable many-function ALU pipeline and a multifunction memory-ALU switch, have provided the capability to efficiently implement a wide range of algorithms. Efficient algorithms typically involve numerically intensive tasks, which often include conditional operations. These operations may be efficiently implemented on the NSC without, in general, sacrificing vector-processing speed. To illustrate the architecture, programming, and several of the capabilities of the NSC, the simulation of two-dimensional, nonsteady viscous flows on a prototype Node, called the miniNode, is presented.
Bartlett, Eric S; Walters, Thomas D; Yu, Eugene
2013-01-01
Objective. We evaluate if axial-based lymph node size criteria can be applied to coronal and sagittal planes. Methods. Fifty pretreatment computed tomographic (CT) neck exams were evaluated in patients with head and neck squamous cell carcinoma (SCCa) and neck lymphadenopathy. Axial-based size criteria were applied to all 3 imaging planes, measured, and classified as "enlarged" if equal to or exceeding size criteria. Results. 222 lymph nodes were "enlarged" in one imaging plane; however, 53.2% (118/222) of these were "enlarged" in all 3 planes. Classification concordance between axial versus coronal/sagittal planes was poor (kappa = -0.09 and -0.07, resp., P < 0.05). The McNemar test showed systematic misclassification when comparing axial versus coronal (P < 0.001) and axial versus sagittal (P < 0.001) planes. Conclusion. Classification of "enlarged" lymph nodes differs between axial versus coronal/sagittal imaging planes when axial-based nodal size criteria are applied independently to all three imaging planes, and exclusively used without other morphologic nodal data.
Bartlett, Eric S.; Walters, Thomas D.; Yu, Eugene
2013-01-01
Objective. We evaluate if axial-based lymph node size criteria can be applied to coronal and sagittal planes. Methods. Fifty pretreatment computed tomographic (CT) neck exams were evaluated in patients with head and neck squamous cell carcinoma (SCCa) and neck lymphadenopathy. Axial-based size criteria were applied to all 3 imaging planes, measured, and classified as “enlarged” if equal to or exceeding size criteria. Results. 222 lymph nodes were “enlarged” in one imaging plane; however, 53.2% (118/222) of these were “enlarged” in all 3 planes. Classification concordance between axial versus coronal/sagittal planes was poor (kappa = −0.09 and −0.07, resp., P < 0.05). The McNemar test showed systematic misclassification when comparing axial versus coronal (P < 0.001) and axial versus sagittal (P < 0.001) planes. Conclusion. Classification of “enlarged” lymph nodes differs between axial versus coronal/sagittal imaging planes when axial-based nodal size criteria are applied independently to all three imaging planes, and exclusively used without other morphologic nodal data. PMID:23984099
Self-Organizing OFDMA System for Broadband Communication
NASA Technical Reports Server (NTRS)
Roy, Aloke (Inventor); Anandappan, Thanga (Inventor); Malve, Sharath Babu (Inventor)
2016-01-01
Systems and methods for a self-organizing OFDMA system for broadband communication are provided. In certain embodiments a communication node for a self organizing network comprises a communication interface configured to transmit data to and receive data from a plurality of nodes; and a processing unit configured to execute computer readable instructions. Further, computer readable instructions direct the processing unit to identify a sub-region within a cell, wherein the communication node is located in the sub-region; and transmit at least one data frame, wherein the data from the communication node is transmitted at a particular time and frequency as defined within the at least one data frame, where the time and frequency are associated with the sub-region.
Collective network for computer structures
Blumrich, Matthias A; Coteus, Paul W; Chen, Dong; Gara, Alan; Giampapa, Mark E; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd E; Steinmacher-Burow, Burkhard D; Vranas, Pavlos M
2014-01-07
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to the needs of a processing algorithm.
Prevention of Malicious Nodes Communication in MANETs by Using Authorized Tokens
NASA Astrophysics Data System (ADS)
Chandrakant, N.; Shenoy, P. Deepa; Venugopal, K. R.; Patnaik, L. M.
A rapid increase of wireless networks and mobile computing applications has changed the landscape of network security. A MANET is more susceptible to the attacks than wired network. As a result, attacks with malicious intent have been and will be devised to take advantage of these vulnerabilities and to cripple the MANET operation. Hence we need to search for new architecture and mechanisms to protect the wireless networks and mobile computing applications. In this paper, we examine the nodes that come under the vicinity of base node and members of the network and communication is provided to genuine nodes only. It is found that the proposed algorithm is a effective algorithm for security in MANETs.
Collective network for computer structures
Blumrich, Matthias A [Ridgefield, CT; Coteus, Paul W [Yorktown Heights, NY; Chen, Dong [Croton On Hudson, NY; Gara, Alan [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Takken, Todd E [Brewster, NY; Steinmacher-Burow, Burkhard D [Wernau, DE; Vranas, Pavlos M [Bedford Hills, NY
2011-08-16
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
The VERCE platform: Enabling Computational Seismology via Streaming Workflows and Science Gateways
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Filgueira, Rosa; Krause, Amrey; Matser, Jonas; Casarotti, Emanuele; Magnoni, Federica; Gemund, Andre; Frobert, Laurent; Krischer, Lion; Atkinson, Malcolm
2015-04-01
The VERCE project is creating an e-Science platform to facilitate innovative data analysis and coding methods that fully exploit the wealth of data in global seismology. One of the technologies developed within the project is the Dispel4Py python library, which allows to describe abstract stream-based workflows for data-intensive applications and to execute them in a distributed environment. At runtime Dispel4Py is able to map workflow descriptions dynamically onto a number of computational resources (Apache Storm clusters, MPI powered clusters, and shared-memory multi-core machines, single-core machines), setting it apart from other workflow frameworks. Therefore, Dispel4Py enables scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Among the workflows developed with Dispel4Py in VERCE, we mention here those for Seismic Ambient Noise Cross-Correlation and MISFIT calculation, which address two data-intensive problems that are common in computational seismology. The former, also called Passive Imaging, allows the detection of relative seismic-wave velocity variations during the time of recording, to be associated with the stress-field changes that occurred in the test area. The MISFIT instead, takes as input the synthetic seismograms generated from HPC simulations for a certain Earth model and earthquake and, after a preprocessing stage, compares them with real observations in order to foster subsequent model updates and improvement (Inversion). The VERCE Science Gateway exposes the MISFIT calculation workflow as a service, in combination with the simulation phase. Both phases can be configured, controlled and monitored by the user via a rich user interface which is integrated within the gUSE Science Gateway framework, hiding the complexity of accessing third parties data services, security mechanisms and enactment on the target resources. Thanks to a modular extension to the Dispel4Py framework, the system collects provenance data adopting the W3C-PROV data model. Provenance recordings can be explored and analysed at run time for rapid diagnostic and workflow steering, or later for further validation and comparisons across runs. We will illustrate the interactive services of the gateway and the capabilities of the produced metadata, coupled with the VERCE data management layer based on iRODS. The Cross-Correlation workflow was evaluated on SuperMUC, a supercomputing cluster at the Leibniz Supercomputing Centre in Munich, with 155,656 processor cores in 9400 compute nodes. SuperMUC is based on the Intel Xeon architecture consisting of 18 Thin Node Islands and one Fat Node Island. This work has only had access to the Thin Node Islands, which contain Sandy Bridge nodes, each having 16 cores and 32 GB of memory. In the evaluations we used 1000 stations, and we applied two types of methods (whiten and non-whiten) for pre-processing the data. The workflow was tested on a varying number of cores (16, 32, 64, 128, and 256 cores) using the MPI mapping of Dispel4Py. The results show that Dispel4Py is able to improve the performance by increasing the number of cores without changing the description of the workflow.
Technique for Calculating Solution Derivatives With Respect to Geometry Parameters in a CFD Code
NASA Technical Reports Server (NTRS)
Mathur, Sanjay
2011-01-01
A solution has been developed to the challenges of computation of derivatives with respect to geometry, which is not straightforward because these are not typically direct inputs to the computational fluid dynamics (CFD) solver. To overcome these issues, a procedure has been devised that can be used without having access to the mesh generator, while still being applicable to all types of meshes. The basic approach is inspired by the mesh motion algorithms used to deform the interior mesh nodes in a smooth manner when the surface nodes, for example, are in a fluid structure interaction problem. The general idea is to model the mesh edges and nodes as constituting a spring-mass system. Changes to boundary node locations are propagated to interior nodes by allowing them to assume their new equilibrium positions, for instance, one where the forces on each node are in balance. The main advantage of the technique is that it is independent of the volumetric mesh generator, and can be applied to structured, unstructured, single- and multi-block meshes. It essentially reduces the problem down to defining the surface mesh node derivatives with respect to the geometry parameters of interest. For analytical geometries, this is quite straightforward. In the more general case, one would need to be able to interrogate the underlying parametric CAD (computer aided design) model and to evaluate the derivatives either analytically, or by a finite difference technique. Because the technique is based on a partial differential equation (PDE), it is applicable not only to forward mode problems (where derivatives of all the output quantities are computed with respect to a single input), but it could also be extended to the adjoint problem, either by using an analytical adjoint of the PDE or a discrete analog.
NASA Astrophysics Data System (ADS)
Liu, Jiamin; Hua, Jeremy; Chellappa, Vivek; Petrick, Nicholas; Sahiner, Berkman; Farooqui, Mohammed; Marti, Gerald; Wiestner, Adrian; Summers, Ronald M.
2012-03-01
Patients with chronic lymphocytic leukemia (CLL) have an increased frequency of axillary lymphadenopathy. Pretreatment CT scans can be used to upstage patients at the time of presentation and post-treatment CT scans can reduce the number of complete responses. In the current clinical workflow, the detection and diagnosis of lymph nodes is usually performed manually by examining all slices of CT images, which can be time consuming and highly dependent on the observer's experience. A system for automatic lymph node detection and measurement is desired. We propose a computer aided detection (CAD) system for axillary lymph nodes on CT scans in CLL patients. The lung is first automatically segmented and the patient's body in lung region is extracted to set the search region for lymph nodes. Multi-scale Hessian based blob detection is then applied to detect potential lymph nodes within the search region. Next, the detected potential candidates are segmented by fast level set method. Finally, features are calculated from the segmented candidates and support vector machine (SVM) classification is utilized for false positive reduction. Two blobness features, Frangi's and Li's, are tested and their free-response receiver operating characteristic (FROC) curves are generated to assess system performance. We applied our detection system to 12 patients with 168 axillary lymph nodes measuring greater than 10 mm. All lymph nodes are manually labeled as ground truth. The system achieved sensitivities of 81% and 85% at 2 false positives per patient for Frangi's and Li's blobness, respectively.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks
Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru
2009-01-01
Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Na, Y; Suh, T; Xing, L
2012-06-01
Multi-objective (MO) plan optimization entails generation of an enormous number of IMRT or VMAT plans constituting the Pareto surface, which presents a computationally challenging task. The purpose of this work is to overcome the hurdle by developing an efficient MO method using emerging cloud computing platform. As a backbone of cloud computing for optimizing inverse treatment planning, Amazon Elastic Compute Cloud with a master node (17.1 GB memory, 2 virtual cores, 420 GB instance storage, 64-bit platform) is used. The master node is able to scale seamlessly a number of working group instances, called workers, based on the user-defined setting account for MO functions in clinical setting. Each worker solved the objective function with an efficient sparse decomposition method. The workers are automatically terminated if there are finished tasks. The optimized plans are archived to the master node to generate the Pareto solution set. Three clinical cases have been planned using the developed MO IMRT and VMAT planning tools to demonstrate the advantages of the proposed method. The target dose coverage and critical structure sparing of plans are comparable obtained using the cloud computing platform are identical to that obtained using desktop PC (Intel Xeon® CPU 2.33GHz, 8GB memory). It is found that the MO planning speeds up the processing of obtaining the Pareto set substantially for both types of plans. The speedup scales approximately linearly with the number of nodes used for computing. With the use of N nodes, the computational time is reduced by the fitting model, 0.2+2.3/N, with r̂2>0.99, on average of the cases making real-time MO planning possible. A cloud computing infrastructure is developed for MO optimization. The algorithm substantially improves the speed of inverse plan optimization. The platform is valuable for both MO planning and future off- or on-line adaptive re-planning. © 2012 American Association of Physicists in Medicine.
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Subspace Dimensionality: A Tool for Automated QC in Seismic Array Processing
NASA Astrophysics Data System (ADS)
Rowe, C. A.; Stead, R. J.; Begnaud, M. L.
2013-12-01
Because of the great resolving power of seismic arrays, the application of automated processing to array data is critically important in treaty verification work. A significant problem in array analysis is the inclusion of bad sensor channels in the beamforming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by node basis, so the dimensionality of the node traffic is instead monitoried for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. In the established template application, a detector functions in a manner analogous to waveform cross-correlation, applying a statistical test to assess the similarity of the incoming data stream to known templates for events of interest. In our approach, we seek not to detect matching signals, but instead, we examine the signal subspace dimensionality in much the same way that the method addresses node traffic anomalies in large computer systems. Signal anomalies recorded on seismic arrays affect the dimensional structure of the array-wide time-series. We have shown previously that this observation is useful in identifying real seismic events, either by looking at the raw signal or derivatives thereof (entropy, kurtosis), but here we explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements through a jackknifing process to isolate the anomalous channels, so that an automated analysis system might discard them prior to FK analysis and beamforming on events of interest.
Simulation Methods for Design of Networked Power Electronics and Information Systems
2014-07-01
Insertion of latency in every branch and at every node permits the system model to be efficiently distributed across many separate computing cores. An... the system . We demonstrated extensibility and generality of the Virtual Test Bed (VTB) framework to support multiple solvers and their associated...Information Systems Objectives The overarching objective of this program is to develop methods for fast
Oak Ridge Institutional Cluster Autotune Test Drive Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jibonananda, Sanyal; New, Joshua Ryan
2014-02-01
The Oak Ridge Institutional Cluster (OIC) provides general purpose computational resources for the ORNL staff to run computation heavy jobs that are larger than desktop applications but do not quite require the scale and power of the Oak Ridge Leadership Computing Facility (OLCF). This report details the efforts made and conclusions derived in performing a short test drive of the cluster resources on Phase 5 of the OIC. EnergyPlus was used in the analysis as a candidate user program and the overall software environment was evaluated against anticipated challenges experienced with resources such as the shared memory-Nautilus (JICS) and Titanmore » (OLCF). The OIC performed within reason and was found to be acceptable in the context of running EnergyPlus simulations. The number of cores per node and the availability of scratch space per node allow non-traditional desktop focused applications to leverage parallel ensemble execution. Although only individual runs of EnergyPlus were executed, the software environment on the OIC appeared suitable to run ensemble simulations with some modifications to the Autotune workflow. From a standpoint of general usability, the system supports common Linux libraries, compilers, standard job scheduling software (Torque/Moab), and the OpenMPI library (the only MPI library) for MPI communications. The file system is a Panasas file system which literature indicates to be an efficient file system.« less
A Survey on the Feasibility of Sound Classification on Wireless Sensor Nodes
Salomons, Etto L.; Havinga, Paul J. M.
2015-01-01
Wireless sensor networks are suitable to gain context awareness for indoor environments. As sound waves form a rich source of context information, equipping the nodes with microphones can be of great benefit. The algorithms to extract features from sound waves are often highly computationally intensive. This can be problematic as wireless nodes are usually restricted in resources. In order to be able to make a proper decision about which features to use, we survey how sound is used in the literature for global sound classification, age and gender classification, emotion recognition, person verification and identification and indoor and outdoor environmental sound classification. The results of the surveyed algorithms are compared with respect to accuracy and computational load. The accuracies are taken from the surveyed papers; the computational loads are determined by benchmarking the algorithms on an actual sensor node. We conclude that for indoor context awareness, the low-cost algorithms for feature extraction perform equally well as the more computationally-intensive variants. As the feature extraction still requires a large amount of processing time, we present four possible strategies to deal with this problem. PMID:25822142
NASA Astrophysics Data System (ADS)
Huang, Jinhui; Liu, Wenxiang; Su, Yingxue; Wang, Feixue
2018-02-01
Space networks, in which connectivity is deterministic and intermittent, can be modeled by delay/disruption tolerant networks. In space delay/disruption tolerant networks, a packet is usually transmitted from the source node to the destination node indirectly via a series of relay nodes. If anyone of the nodes in the path becomes congested, the packet will be dropped due to buffer overflow. One of the main reasons behind congestion is the unbalanced network traffic distribution. We propose a load balancing strategy which takes the congestion status of both the local node and relay nodes into account. The congestion status, together with the end-to-end delay, is used in the routing selection. A lookup-table enhancement is also proposed. The off-line computation and the on-line adjustment are combined together to make a more precise estimate of the end-to-end delay while at the same time reducing the onboard computation. Simulation results show that the proposed strategy helps to distribute network traffic more evenly and therefore reduces the packet drop ratio. In addition, the average delay is also decreased in most cases. The lookup-table enhancement provides a compromise between the need for better communication performance and the desire for less onboard computation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lawton, Colleen A.F.; Michalski, Jeff; El-Naqa, Issam
2009-06-01
Purpose: Radiation therapy to the pelvic lymph nodes in high-risk prostate cancer is required on several Radiation Therapy Oncology Group (RTOG) clinical trials. Based on a prior lymph node contouring project, we have shown significant disagreement in the definition of pelvic lymph node volumes among genitourinary radiation oncology specialists involved in developing and executing current RTOG trials. Materials and Methods: A consensus meeting was held on October 3, 2007, to reach agreement on pelvic lymph node volumes. Data were presented to address the lymph node drainage of the prostate. Extensive discussion ensued to develop clinical target volume (CTV) pelvic lymphmore » node consensus. Results: Consensus was obtained resulting in computed tomography image-based pelvic lymph node CTVs. Based on this consensus, the pelvic lymph node volumes to be irradiated include: distal common iliac, presacral lymph nodes (S{sub 1}-S{sub 3}), external iliac lymph nodes, internal iliac lymph nodes, and obturator lymph nodes. Lymph node CTVs include the vessels (artery and vein) and a 7-mm radial margin being careful to 'carve out' bowel, bladder, bone, and muscle. Volumes begin at the L5/S1 interspace and end at the superior aspect of the pubic bone. Consensus on dose-volume histogram constraints for OARs was also attained. Conclusions: Consensus on pelvic lymph node CTVs for radiation therapy to address high-risk prostate cancer was attained and is available as web-based computed tomography images as well as a descriptive format through the RTOG. This will allow for uniformity in evaluating the benefit and risk of such treatment.« less
Distributed downhole drilling network
Hall, David R.; Hall, Jr., H. Tracy; Fox, Joe; Pixton, David S.
2006-11-21
A high-speed downhole network providing real-time data from downhole components of a drilling strings includes a bottom-hole node interfacing to a bottom-hole assembly located proximate the bottom end of a drill string. A top-hole node is connected proximate the top end of the drill string. One or several intermediate nodes are located along the drill string between the bottom-hole node and the top-hole node. The intermediate nodes are configured to receive and transmit data packets transmitted between the bottom-hole node and the top-hole node. A communications link, integrated into the drill string, is used to operably connect the bottom-hole node, the intermediate nodes, and the top-hole node. In selected embodiments, a personal or other computer may be connected to the top-hole node, to analyze data received from the intermediate and bottom-hole nodes.
Efficient computation of kinship and identity coefficients on large pedigrees.
Cheng, En; Elliott, Brendan; Ozsoyoglu, Z Meral
2009-06-01
With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing inbreeding coefficient. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. Furthermore, we propose an improved scheme using Family NodeCodes for the computation of generalized kinship coefficients, which is motivated by the significant improvement of using Family NodeCodes for inbreeding coefficient over the use of NodeCodes. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
Parallel Calculations in LS-DYNA
NASA Astrophysics Data System (ADS)
Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey
2017-11-01
Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Direct formulation of a 4-node hybrid shell element with rotational degrees of freedom
NASA Technical Reports Server (NTRS)
Aminpour, Mohammad A.
1990-01-01
A simple 4-node assumed-stress hybrid quadrilateral shell element with rotational or drilling degrees of freedom is formulated. The element formulation is based directly on a 4-node element. This direct formulation requires fewer computations than a similar element that is derived from an internal 8-node isoparametric element in which the midside degrees of freedom are eliminated in favor of rotational degree of freedom at the corner nodes. The formulation is based on the principle of minimum complementary energy. The membrane part of the element has 12 degrees of freedom including rotational degrees of freedom. The bending part of the element also has 12 degrees of freedom. The bending part of the quadratic variations for both in-plane and out-of-plane displacement fields and linear variations for both in-plane and out-of-plane rotation fields are assumed along the edges of the element. The element Cartesian-coordinate system is chosen such as to make the stress field invariant with respect to node numbering. The membrane part of the stress field is based on a 9-parameter equilibrating stress field, while the bending part is based on a 13-parameter equilibrating stress field. The element passes the patch test, is nearly insensitive to mesh distortion, does not lock, possesses the desirable invariance properties, has no spurious modes, and produces accurate and reliable results.
Kishino, Takayoshi; Okano, Keiichi; Ando, Yasuhisa; Suto, Hironobu; Asano, Eisuke; Oshima, Minoru; Fujiwara, Masao; Usuki, Hisashi; Kobara, Hideki; Masaki, Tsutomu; Ibuki, Emi; Kushida, Yoshio; Haba, Reiji; Suzuki, Yasuyuki
2018-06-25
In patients with esophageal cancer, differentiation between lymph node metastasis and lymphadenopathies from sarcoidosis or sarcoid-like reactions of lymph nodes is clinically important. Herein, we report two esophageal cancer cases with lymph node involvement of sarcoid-like reaction or sarcoidosis. One patient received chemotherapy and the other chemoradiotherapy as initial treatments. In both cases, [ 18 F]-fluorodeoxyglucose positron emission tomography-computed tomography (FDG-PET/CT) was performed before and after chemo(radio)therapy. After the treatment, FDG uptake was not detected in the primary tumor, but it was slightly reduced in the hilar and mediastinal lymph nodes in both cases. These non-identical responses to chemo(radio)therapy suggest the presence of sarcoid-like reaction of lymph nodes associated with squamous cell carcinoma of the esophagus. Curative surgical resection was performed as treatment. These FDG-PET/CT findings may be helpful to distinguish between metastasis and sarcoidosis-associated lymphadenopathy in esophageal cancer.
NASA Astrophysics Data System (ADS)
Cowell, Martin Andrew
The world already hosts more internet connected devices than people, and that ratio is only increasing. These devices seamlessly integrate with peoples lives to collect rich data and give immediate feedback about complex systems from business, health care, transportation, and security. As every aspect of global economies integrate distributed computing into their industrial systems and these systems benefit from rich datasets. Managing the power demands of these distributed computers will be paramount to ensure the continued operation of these networks, and is elegantly addressed by including local energy harvesting and storage on a per-node basis. By replacing non-rechargeable batteries with energy harvesting, wireless sensor nodes will increase their lifetimes by an order of magnitude. This work investigates the coupling of high power energy storage with energy harvesting technologies to power wireless sensor nodes; with sections covering device manufacturing, system integration, and mathematical modeling. First we consider the energy storage mechanism of supercapacitors and batteries, and identify favorable characteristics in both reservoir types. We then discuss experimental methods used to manufacture high power supercapacitors in our labs. We go on to detail the integration of our fabricated devices with collaborating labs to create functional sensor node demonstrations. With the practical knowledge gained through in-lab manufacturing and system integration, we build mathematical models to aid in device and system design. First, we model the mechanism of energy storage in porous graphene supercapacitors to aid in component architecture optimization. We then model the operation of entire sensor nodes for the purpose of optimally sizing the energy harvesting and energy reservoir components. In consideration of deploying these sensor nodes in real-world environments, we model the operation of our energy harvesting and power management systems subject to spatially and temporally varying energy availability in order to understand sensor node reliability. Looking to the future, we see an opportunity for further research to implement machine learning algorithms to control the energy resources of distributed computing networks.
An evaluation of the state of time synchronization on leadership class supercomputers
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.; ...
2017-10-09
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
An evaluation of the state of time synchronization on leadership class supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry; Ostrouchov, George; Koenig, Gregory A.
We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Tanaka, Toshiaki; Nozawa, Hiroaki; Kawai, Kazushige; Hata, Keisuke; Kiyomatsu, Tomomichi; Nishikawa, Takeshi; Otani, Kensuke; Sasaki, Kazuhito; Murono, Koji; Watanabe, Toshiaki
2017-01-01
Colorectal neuroendocrine tumors (NET) are a rare manifestation of colorectal neoplasia, requiring for radical dissection of the regional lymph nodes along with colorectal resection similar to that required for colorectal cancer. However, thus far, no reports have described the ability of computed tomography (CT) to predict lymph node involvement. In this study, we revealed the prediction rate of lymph node metastasis using contrast-enhanced CT. A total of 21 patients with colorectal NET undergoing colorectal resection were recruited from January 2010 to June 2016. We compared the CT findings between samples with or without pathologically proven lymph node metastasis, in each field (pericolic/perirectal and intermediate nodes). Within the pericolic/perirectal field, any lymph node larger than 5 mm in the CT images was a predictive indicator of lymph node metastasis with a sensitivity, specificity, and area under ROC curve (AUC) of 66.7%, 87.5%, and 0.844, respectively. Within the intermediate field, any visible lymph node on the CT was a predictive indicator of lymph node metastasis with a sensitivity, specificity, and AUC of 100%, 76.4%, and 0.890, respectively. In addition, when we observed lymph nodes larger than 3 mm on the CT images, the sensitivity and specificity were 100% and 82.4%, respectively, with an AUC of 0.8971. CT images provide predictive information for lymph node metastasis with a high rate of accuracy. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Research in Wireless Networks and Communications
2008-05-01
TESTBED SETUP AND INITIAL MULTI-HOP EXPERIENCE As a proof of concept, we assembled a testbed platform of nodes based on 400MHz AMD Geode single-board...experi- ments on a testbed network consisting of 400MHz AMD Geode single-board computers made by Thecus Inc. We equipped each of these nodes with two...ground nodes were placed on a line, with about 3 feet of separation between adjacent nodes. The nodes were powered by 400MHz AMD Geode single-board
WinHPC System Configuration | High-Performance Computing | NREL
CPUs with 48GB of memory. Node 04 has dual Intel Xeon E5530 CPUs with 24GB of memory. Nodes 05-20 have dual AMD Opteron 2374 HE CPUs with 16GB of memory. Nodes 21-30 have been decommissioned. Nodes 31-35 have dual Intel Xeon X5675 CPUs with 48GB of memory. Nodes 36-37 have dual Intel Xeon E5-2680 CPUs with
A Spiking Neural Network Model of the Lateral Geniculate Nucleus on the SpiNNaker Machine
Sen-Bhattacharya, Basabdatta; Serrano-Gotarredona, Teresa; Balassa, Lorinc; Bhattacharya, Akash; Stokes, Alan B.; Rowley, Andrew; Sugiarto, Indar; Furber, Steve
2017-01-01
We present a spiking neural network model of the thalamic Lateral Geniculate Nucleus (LGN) developed on SpiNNaker, which is a state-of-the-art digital neuromorphic hardware built with very-low-power ARM processors. The parallel, event-based data processing in SpiNNaker makes it viable for building massively parallel neuro-computational frameworks. The LGN model has 140 neurons representing a “basic building block” for larger modular architectures. The motivation of this work is to simulate biologically plausible LGN dynamics on SpiNNaker. Synaptic layout of the model is consistent with biology. The model response is validated with existing literature reporting entrainment in steady state visually evoked potentials (SSVEP)—brain oscillations corresponding to periodic visual stimuli recorded via electroencephalography (EEG). Periodic stimulus to the model is provided by: a synthetic spike-train with inter-spike-intervals in the range 10–50 Hz at a resolution of 1 Hz; and spike-train output from a state-of-the-art electronic retina subjected to a light emitting diode flashing at 10, 20, and 40 Hz, simulating real-world visual stimulus to the model. The resolution of simulation is 0.1 ms to ensure solution accuracy for the underlying differential equations defining Izhikevichs neuron model. Under this constraint, 1 s of model simulation time is executed in 10 s real time on SpiNNaker; this is because simulations on SpiNNaker work in real time for time-steps dt ⩾ 1 ms. The model output shows entrainment with both sets of input and contains harmonic components of the fundamental frequency. However, suppressing the feed-forward inhibition in the circuit produces subharmonics within the gamma band (>30 Hz) implying a reduced information transmission fidelity. These model predictions agree with recent lumped-parameter computational model-based predictions, using conventional computers. Scalability of the framework is demonstrated by a multi-node architecture consisting of three “nodes,” where each node is the “basic building block” LGN model. This 420 neuron model is tested with synthetic periodic stimulus at 10 Hz to all the nodes. The model output is the average of the outputs from all nodes, and conforms to the above-mentioned predictions of each node. Power consumption for model simulation on SpiNNaker is ≪1 W. PMID:28848380
A Spiking Neural Network Model of the Lateral Geniculate Nucleus on the SpiNNaker Machine.
Sen-Bhattacharya, Basabdatta; Serrano-Gotarredona, Teresa; Balassa, Lorinc; Bhattacharya, Akash; Stokes, Alan B; Rowley, Andrew; Sugiarto, Indar; Furber, Steve
2017-01-01
We present a spiking neural network model of the thalamic Lateral Geniculate Nucleus (LGN) developed on SpiNNaker, which is a state-of-the-art digital neuromorphic hardware built with very-low-power ARM processors. The parallel, event-based data processing in SpiNNaker makes it viable for building massively parallel neuro-computational frameworks. The LGN model has 140 neurons representing a "basic building block" for larger modular architectures. The motivation of this work is to simulate biologically plausible LGN dynamics on SpiNNaker. Synaptic layout of the model is consistent with biology. The model response is validated with existing literature reporting entrainment in steady state visually evoked potentials (SSVEP)-brain oscillations corresponding to periodic visual stimuli recorded via electroencephalography (EEG). Periodic stimulus to the model is provided by: a synthetic spike-train with inter-spike-intervals in the range 10-50 Hz at a resolution of 1 Hz; and spike-train output from a state-of-the-art electronic retina subjected to a light emitting diode flashing at 10, 20, and 40 Hz, simulating real-world visual stimulus to the model. The resolution of simulation is 0.1 ms to ensure solution accuracy for the underlying differential equations defining Izhikevichs neuron model. Under this constraint, 1 s of model simulation time is executed in 10 s real time on SpiNNaker; this is because simulations on SpiNNaker work in real time for time-steps dt ⩾ 1 ms. The model output shows entrainment with both sets of input and contains harmonic components of the fundamental frequency. However, suppressing the feed-forward inhibition in the circuit produces subharmonics within the gamma band (>30 Hz) implying a reduced information transmission fidelity. These model predictions agree with recent lumped-parameter computational model-based predictions, using conventional computers. Scalability of the framework is demonstrated by a multi-node architecture consisting of three "nodes," where each node is the "basic building block" LGN model. This 420 neuron model is tested with synthetic periodic stimulus at 10 Hz to all the nodes. The model output is the average of the outputs from all nodes, and conforms to the above-mentioned predictions of each node. Power consumption for model simulation on SpiNNaker is ≪1 W.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn
2016-07-15
Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifiermore » for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.« less
Effects of maximum node degree on computer virus spreading in scale-free networks
NASA Astrophysics Data System (ADS)
Bamaarouf, O.; Ould Baba, A.; Lamzabi, S.; Rachadi, A.; Ez-Zahraouy, H.
2017-10-01
The increase of the use of the Internet networks favors the spread of viruses. In this paper, we studied the spread of viruses in the scale-free network with different topologies based on the Susceptible-Infected-External (SIE) model. It is found that the network structure influences the virus spreading. We have shown also that the nodes of high degree are more susceptible to infection than others. Furthermore, we have determined a critical maximum value of node degree (Kc), below which the network is more resistible and the computer virus cannot expand into the whole network. The influence of network size is also studied. We found that the network with low size is more effective to reduce the proportion of infected nodes.
Optimized scalable network switch
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2007-12-04
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Optimized scalable network switch
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
2010-02-23
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Fault-Tolerant Local-Area Network
NASA Technical Reports Server (NTRS)
Morales, Sergio; Friedman, Gary L.
1988-01-01
Local-area network (LAN) for computers prevents single-point failure from interrupting communication between nodes of network. Includes two complete cables, LAN 1 and LAN 2. Microprocessor-based slave switches link cables to network-node devices as work stations, print servers, and file servers. Slave switches respond to commands from master switch, connecting nodes to two cable networks or disconnecting them so they are completely isolated. System monitor and control computer (SMC) acts as gateway, allowing nodes on either cable to communicate with each other and ensuring that LAN 1 and LAN 2 are fully used when functioning properly. Network monitors and controls itself, automatically routes traffic for efficient use of resources, and isolates and corrects its own faults, with potential dramatic reduction in time out of service.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tock, Yoav; Mandler, Benjamin; Moreira, Jose
2013-01-01
As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns be- come paramount.We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a computation lifetime. The core of our solution is a hierarchical scalable membership service provid- ing eventual consistency semantics. An attribute replication service is used for hierarchy organization, and is exposed to external applications. Our solution is based on P2P technologies and provides resiliencymore » and elastic runtime support at ultra large scales. Resulting middleware is general purpose while exploiting HPC platform unique features and architecture. We have implemented and tested this system on BlueGene/P with Linux, and using worst-case analysis, evaluated the service scalability as effective for up to 1M nodes.« less
Deterministic delivery of remote entanglement on a quantum network.
Humphreys, Peter C; Kalb, Norbert; Morits, Jaco P J; Schouten, Raymond N; Vermeulen, Raymond F L; Twitchen, Daniel J; Markham, Matthew; Hanson, Ronald
2018-06-01
Large-scale quantum networks promise to enable secure communication, distributed quantum computing, enhanced sensing and fundamental tests of quantum mechanics through the distribution of entanglement across nodes 1-7 . Moving beyond current two-node networks 8-13 requires the rate of entanglement generation between nodes to exceed the decoherence (loss) rate of the entanglement. If this criterion is met, intrinsically probabilistic entangling protocols can be used to provide deterministic remote entanglement at pre-specified times. Here we demonstrate this using diamond spin qubit nodes separated by two metres. We realize a fully heralded single-photon entanglement protocol that achieves entangling rates of up to 39 hertz, three orders of magnitude higher than previously demonstrated two-photon protocols on this platform 14 . At the same time, we suppress the decoherence rate of remote-entangled states to five hertz through dynamical decoupling. By combining these results with efficient charge-state control and mitigation of spectral diffusion, we deterministically deliver a fresh remote state with an average entanglement fidelity of more than 0.5 at every clock cycle of about 100 milliseconds without any pre- or post-selection. These results demonstrate a key building block for extended quantum networks and open the door to entanglement distribution across multiple remote nodes.
ERIC Educational Resources Information Center
Adesope, Olusola O.; Cavagnetto, Andy; Hunsu, Nathaniel J.; Anguiano, Carlos; Lloyd, Joshua
2017-01-01
This study used a between-subjects experimental design to examine the effects of three different computer-based instructional strategies (concept map, refutation text, and expository scientific text) on science learning. Concept maps are node-link diagrams that show concepts as nodes and relationships among the concepts as labeled links.…
High Performance Active Database Management on a Shared-Nothing Parallel Processor
1998-05-01
either stored or virtual. A stored node is like a materialized view. It actually contains the specified tuples. A virtual node is like a real view...90292-6695 DL-5 COLUMBIA UNIV/DEPT COMPUTER SCIENCi ATTN: OR GAIL £. KAISER 450 COMPUTER SCIENCE 3LDG 500 WEST 12ÖTH STRSET NEW YORK NY 10027
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, T.; Atac, R.; Cook, A.
1989-03-06
The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
NASA Astrophysics Data System (ADS)
Manfredi, Sabato
2016-06-01
Large-scale dynamic systems are becoming highly pervasive in their occurrence with applications ranging from system biology, environment monitoring, sensor networks, and power systems. They are characterised by high dimensionality, complexity, and uncertainty in the node dynamic/interactions that require more and more computational demanding methods for their analysis and control design, as well as the network size and node system/interaction complexity increase. Therefore, it is a challenging problem to find scalable computational method for distributed control design of large-scale networks. In this paper, we investigate the robust distributed stabilisation problem of large-scale nonlinear multi-agent systems (briefly MASs) composed of non-identical (heterogeneous) linear dynamical systems coupled by uncertain nonlinear time-varying interconnections. By employing Lyapunov stability theory and linear matrix inequality (LMI) technique, new conditions are given for the distributed control design of large-scale MASs that can be easily solved by the toolbox of MATLAB. The stabilisability of each node dynamic is a sufficient assumption to design a global stabilising distributed control. The proposed approach improves some of the existing LMI-based results on MAS by both overcoming their computational limits and extending the applicative scenario to large-scale nonlinear heterogeneous MASs. Additionally, the proposed LMI conditions are further reduced in terms of computational requirement in the case of weakly heterogeneous MASs, which is a common scenario in real application where the network nodes and links are affected by parameter uncertainties. One of the main advantages of the proposed approach is to allow to move from a centralised towards a distributed computing architecture so that the expensive computation workload spent to solve LMIs may be shared among processors located at the networked nodes, thus increasing the scalability of the approach than the network size. Finally, a numerical example shows the applicability of the proposed method and its advantage in terms of computational complexity when compared with the existing approaches.
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Christopher H.; Long, Hai; Sides, Scott
2015-10-15
Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
NASA Astrophysics Data System (ADS)
Poplawski, Blazej; Mikułowski, Grzegorz; Mróz, Arkadiusz; Jankowski, Łukasz
2018-02-01
This paper proposes, tests numerically and verifies experimentally a decentralized control algorithm with local feedback for semi-active mitigation of free vibrations in frame structures. The algorithm aims at transferring the vibration energy of low-order, lightly-damped structural modes into high-frequency modes of vibration, where it is quickly damped by natural mechanisms of material damping. Such an approach to mitigation of vibrations, known as the prestress-accumulation release (PAR) strategy, has been earlier applied only in global control schemes to the fundamental vibration mode of a cantilever beam. In contrast, the decentralization and local feedback allows the approach proposed here to be applied to more complex frame structures and vibration patterns, where the global control ceases to be intuitively obvious. The actuators (truss-frame nodes with controllable ability to transmit moments) are essentially unblockable hinges that become unblocked only for very short time periods in order to trigger local modal transfer of energy. The paper proposes a computationally simple model of the controllable nodes, specifies the control performance measure, yields basic characteristics of the optimum control, proposes the control algorithm and then tests it in numerical and experimental examples.
Acoustic Sensor Network for Relative Positioning of Nodes
De Marziani, Carlos; Ureña, Jesus; Hernandez, Álvaro; Mazo, Manuel; García, Juan Jesús; Jimenez, Ana; Rubio, María del Carmen Pérez; Álvarez, Fernando; Villadangos, José Manuel
2009-01-01
In this work, an acoustic sensor network for a relative localization system is analyzed by reporting the accuracy achieved in the position estimation. The proposed system has been designed for those applications where objects are not restricted to a particular environment and thus one cannot depend on any external infrastructure to compute their positions. The objects are capable of computing spatial relations among themselves using only acoustic emissions as a ranging mechanism. The object positions are computed by a multidimensional scaling (MDS) technique and, afterwards, a least-square algorithm, based on the Levenberg-Marquardt algorithm (LMA), is applied to refine results. Regarding the position estimation, all the parameters involved in the computation of the temporary relations with the proposed ranging mechanism have been considered. The obtained results show that a fine-grained localization can be achieved considering a Gaussian distribution error in the proposed ranging mechanism. Furthermore, since acoustic sensors require a line-of-sight to properly work, the system has been tested by modeling the lost of this line-of-sight as a non-Gaussian error. A suitable position estimation has been achieved even if it is considered a bias of up to 25 of the line-of-sight measurements among a set of nodes. PMID:22291520
Multi-agent grid system Agent-GRID with dynamic load balancing of cluster nodes
NASA Astrophysics Data System (ADS)
Satymbekov, M. N.; Pak, I. T.; Naizabayeva, L.; Nurzhanov, Ch. A.
2017-12-01
In this study the work presents the system designed for automated load balancing of the contributor by analysing the load of compute nodes and the subsequent migration of virtual machines from loaded nodes to less loaded ones. This system increases the performance of cluster nodes and helps in the timely processing of data. A grid system balances the work of cluster nodes the relevance of the system is the award of multi-agent balancing for the solution of such problems.
Synthesis of natural flows at selected sites in the upper Missouri River basin, Montana, 1928-89
Cary, L.E.; Parrett, Charles
1996-01-01
Natural monthly streamflows were synthesized for the years 1928-89 for 43 sites in the upper Missouri River Basin upstream from Fort Peck Lake in Montana. The sites are represented as nodes in a streamflow accounting model being developed by the Bureau of Reclamation. Recorded and historical flows at most sites have been affected by human activities including reservoir storage, diversions for irrigation, and municipal use. Natural flows at the sites were synthesized by eliminating the effects of these activities. Recorded data at some sites do not include the entire study period. The missing flows at these sites were estimated using a statistical procedure. The methods of synthesis varied, depending on upstream activities and information available. Recorded flows were transferred to nodes that did not have streamflow-gaging stations from the nearest station with a sufficient length of record. The flows at one node were computed as the sum of flows from three upstream tributaries. Monthly changes in reservoir storage were computed from monthend contents. The changes in storage were corrected for the effects of evaporation and precipitation using pan-evaporation and precipitation data from climate stations. Irrigation depletions and consumptive use by the three largest municipalities were computed. Synthesized natural flow at most nodes was computed by adding algebraically the upstream depletions and changes in reservoir storage to recorded or historical flow at the nodes.
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
GATE Monte Carlo simulation in a cloud computing environment
NASA Astrophysics Data System (ADS)
Rowedder, Blake Austin
The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
An operating system for future aerospace vehicle computer systems
NASA Technical Reports Server (NTRS)
Foudriat, E. C.; Berman, W. J.; Will, R. W.; Bynum, W. L.
1984-01-01
The requirements for future aerospace vehicle computer operating systems are examined in this paper. The computer architecture is assumed to be distributed with a local area network connecting the nodes. Each node is assumed to provide a specific functionality. The network provides for communication so that the overall tasks of the vehicle are accomplished. The O/S structure is based upon the concept of objects. The mechanisms for integrating node unique objects with node common objects in order to implement both the autonomy and the cooperation between nodes is developed. The requirements for time critical performance and reliability and recovery are discussed. Time critical performance impacts all parts of the distributed operating system; e.g., its structure, the functional design of its objects, the language structure, etc. Throughout the paper the tradeoffs - concurrency, language structure, object recovery, binding, file structure, communication protocol, programmer freedom, etc. - are considered to arrive at a feasible, maximum performance design. Reliability of the network system is considered. A parallel multipath bus structure is proposed for the control of delivery time for time critical messages. The architecture also supports immediate recovery for the time critical message system after a communication failure.
Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks
Piraveenan, Mahendra; Prokopenko, Mikhail; Hossain, Liaquat
2013-01-01
A number of centrality measures are available to determine the relative importance of a node in a complex network, and betweenness is prominent among them. However, the existing centrality measures are not adequate in network percolation scenarios (such as during infection transmission in a social network of individuals, spreading of computer viruses on computer networks, or transmission of disease over a network of towns) because they do not account for the changing percolation states of individual nodes. We propose a new measure, percolation centrality, that quantifies relative impact of nodes based on their topological connectivity, as well as their percolation states. The measure can be extended to include random walk based definitions, and its computational complexity is shown to be of the same order as that of betweenness centrality. We demonstrate the usage of percolation centrality by applying it to a canonical network as well as simulated and real world scale-free and random networks. PMID:23349699
Practical End-to-End Performance Testing Tool for High Speed 3G-Based Networks
NASA Astrophysics Data System (ADS)
Shinbo, Hiroyuki; Tagami, Atsushi; Ano, Shigehiro; Hasegawa, Toru; Suzuki, Kenji
High speed IP communication is a killer application for 3rd generation (3G) mobile systems. Thus 3G network operators should perform extensive tests to check whether expected end-to-end performances are provided to customers under various environments. An important objective of such tests is to check whether network nodes fulfill requirements to durations of processing packets because a long duration of such processing causes performance degradation. This requires testers (persons who do tests) to precisely know how long a packet is hold by various network nodes. Without any tool's help, this task is time-consuming and error prone. Thus we propose a multi-point packet header analysis tool which extracts and records packet headers with synchronized timestamps at multiple observation points. Such recorded packet headers enable testers to calculate such holding durations. The notable feature of this tool is that it is implemented on off-the shelf hardware platforms, i.e., lap-top personal computers. The key challenges of the implementation are precise clock synchronization without any special hardware and a sophisticated header extraction algorithm without any drop.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Baun, Christian
2016-01-01
Clusters usually consist of servers, workstations or personal computers as nodes. But especially for academic purposes like student projects or scientific projects, the cost for purchase and operation can be a challenge. Single board computers cannot compete with the performance or energy-efficiency of higher-value systems, but they are an option to build inexpensive cluster systems. Because of the compact design and modest energy consumption, it is possible to build clusters of single board computers in a way that they are mobile and can be easily transported by the users. This paper describes the construction of such a cluster, useful applications and the performance of the single nodes. Furthermore, the clusters' performance and energy-efficiency is analyzed by executing the High Performance Linpack benchmark with a different number of nodes and different proportion of the systems total main memory utilized.
Scheduling based on a dynamic resource connection
NASA Astrophysics Data System (ADS)
Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.
2017-02-01
The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.
View-Dependent Simplification of Arbitrary Polygonal Environments
2006-01-01
of backfacing nodes are not rendered [ Kumar 96]. 4.3 Triangle-Budget Simplification The screenspace error threshold and silhouette test allow the user...Greg Turk, and Dinesh Manocha for their invaluable guidance and support throughout this project. Funding for this work was provided by DARPA...Proceedings Visualization 95 , IEEE Computer Society Press (Atlanta, GA), 1995, pp. 296-303. [ Kumar 96] Kumar , Subodh, D. Manocha, W. Garrett, M. Lin
NASA Technical Reports Server (NTRS)
Wolpert, David H. (Inventor)
2003-01-01
Distributed approach for determining a path connecting adjacent network nodes, for probabilistically or deterministically transporting an entity, with entity characteristic mu from a source node to a destination node. Each node i is directly connected to an arbitrary number J(mu) of nodes, labeled or numbered j=jl, j2, .... jJ(mu). In a deterministic version, a J(mu)-component baseline proportion vector p(i;mu) is associated with node i. A J(mu)-component applied proportion vector p*(i;mu) is determined from p(i;mu) to preclude an entity visiting a node more than once. Third and fourth J(mu)-component vectors, with components iteratively determined by Target(i;n(mu);mu),=alpha(mu).Target(i;n(mu)-1;mu)j+beta(mu).p* (i;mu)j and Actual(i;n(mu);+a(mu)j. Actual(i;n(mu)-l;mu)j+beta(mu).Sent(i;j'(mu);n(mu)-1;mu)j, are computed, where n(mu) is an entity sequence index and alpha(mu) and beta(mu) are selected numbers. In one embodiment, at each node i, the node j=j'(mu) with the largest vector component difference, Target(i;n(mu);mu)j'- Actual (i;n(mu);mu)j'. is chosen for the next link for entity transport, except in special gap circumstances, where the same link is optionally used for transporting consecutively arriving entities. The network nodes may be computer-controlled routers that switch collections of packets, frames, cells or other information units. Alternatively, the nodes may be waypoints for movement of physical items in a network or for transformation of a physical item. The nodes may be states of an entity undergoing state transitions, where allowed transitions are specified by the network and/or the destination node.
Send-side matching of data communications messages
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-07-01
Send-side matching of data communications messages includes a plurality of compute nodes organized for collective operations, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message.
Wagner, Lars M; Kremer, Nathalie; Gelfand, Michael J; Sharp, Susan E; Turpin, Brian K; Nagarajan, Rajaram; Tiao, Gregory M; Pressey, Joseph G; Yin, Julie; Dasgupta, Roshni
2017-01-01
Lymph node metastases are an important cause of treatment failure for pediatric and adolescent/young adult (AYA) sarcoma patients. Nodal sampling is recommended for certain sarcoma subtypes that have a predilection for lymphatic spread. Sentinel lymph node biopsy (SLNB) may improve the diagnostic yield of nodal sampling, particularly when single-photon emission computed tomography/computed tomography (SPECT-CT) is used to facilitate anatomic localization. Functional imaging with positron emission tomography/computed tomography (PET-CT) is increasingly used for sarcoma staging and is a less invasive alternative to SLNB. To assess the utility of these 2 staging methods, this study prospectively compared SLNB plus SPECT-CT with PET-CT for the identification of nodal metastases in pediatric and AYA patients. Twenty-eight pediatric and AYA sarcoma patients underwent SLNB with SPECT-CT. The histological findings of the excised lymph nodes were then correlated with preoperative PET-CT imaging. A median of 2.4 sentinel nodes were sampled per patient. No wound infections or chronic lymphedema occurred. SLNB identified tumors in 7 of the 28 patients (25%), including 3 patients who had normal PET-CT imaging of the nodal basin. In contrast, PET-CT demonstrated hypermetabolic regional nodes in 14 patients, and this resulted in a positive predictive value of only 29%. The sensitivity and specificity of PET-CT for detecting histologically confirmed nodal metastases were only 57% and 52%, respectively. SLNB can safely guide the rational selection of nodes for biopsy in pediatric and AYA sarcoma patients and can identify therapy-changing nodal disease not appreciated with PET-CT. Cancer 2017;155-160. © 2016 American Cancer Society. © 2016 American Cancer Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krengli, Marco; Ballare, Andrea; Cannillo, Barbara
2006-11-15
Purpose: This study aims to investigate the in vivo drainage of lymphatic spread by using the sentinel node (SN) technique and single-photon emission computed tomography (SPECT)-computed tomography (CT) image fusion, and to analyze the impact of such information on conformal pelvic irradiation. Methods and Materials: Twenty-three prostate cancer patients, candidates for radical prostatectomy already included in a trial studying the SN technique, were enrolled. CT and SPECT images were obtained after intraprostate injection of 115 MBq of {sup 99m}Tc-nanocolloid, allowing identification of SN and other pelvic lymph nodes. Target and nontarget structures, including lymph nodes identified by SPECT, were drawnmore » on SPECT-CT fusion images. A three-dimensional conformal treatment plan was performed for each patient. Results: Single-photon emission computed tomography lymph nodal uptake was detected in 20 of 23 cases (87%). The SN was inside the pelvic clinical target volume (CTV{sub 2}) in 16 of 20 cases (80%) and received no less than the prescribed dose in 17 of 20 cases (85%). The most frequent locations of SN outside the CTV{sub 2} were the common iliac and presacral lymph nodes. Sixteen of the 32 other lymph nodes (50%) identified by SPECT were found outside the CTV{sub 2}. Overall, the SN and other intrapelvic lymph nodes identified by SPECT were not included in the CTV{sub 2} in 5 of 20 (25%) patients. Conclusions: The study of lymphatic drainage can contribute to a better knowledge of the in vivo potential pattern of lymph node metastasis in prostate cancer and can lead to a modification of treatment volume with consequent optimization of pelvic irradiation.« less
Predicting axillary lymph node metastasis from kinetic statistics of DCE-MRI breast images
NASA Astrophysics Data System (ADS)
Ashraf, Ahmed B.; Lin, Lilie; Gavenonis, Sara C.; Mies, Carolyn; Xanthopoulos, Eric; Kontos, Despina
2012-03-01
The presence of axillary lymph node metastases is the most important prognostic factor in breast cancer and can influence the selection of adjuvant therapy, both chemotherapy and radiotherapy. In this work we present a set of kinetic statistics derived from DCE-MRI for predicting axillary node status. Breast DCE-MRI images from 69 women with known nodal status were analyzed retrospectively under HIPAA and IRB approval. Axillary lymph nodes were positive in 12 patients while 57 patients had no axillary lymph node involvement. Kinetic curves for each pixel were computed and a pixel-wise map of time-to-peak (TTP) was obtained. Pixels were first partitioned according to the similarity of their kinetic behavior, based on TTP values. For every kinetic curve, the following pixel-wise features were computed: peak enhancement (PE), wash-in-slope (WIS), wash-out-slope (WOS). Partition-wise statistics for every feature map were calculated, resulting in a total of 21 kinetic statistic features. ANOVA analysis was done to select features that differ significantly between node positive and node negative women. Using the computed kinetic statistic features a leave-one-out SVM classifier was learned that performs with AUC=0.77 under the ROC curve, outperforming the conventional kinetic measures, including maximum peak enhancement (MPE) and signal enhancement ratio (SER), (AUCs of 0.61 and 0.57 respectively). These findings suggest that our DCE-MRI kinetic statistic features can be used to improve the prediction of axillary node status in breast cancer patients. Such features could ultimately be used as imaging biomarkers to guide personalized treatment choices for women diagnosed with breast cancer.
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-01-01
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-03-10
This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junghyun; Gangwon, Jo; Jaehoon, Jung
Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined inmore » a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.« less
Pseudo-random dynamic address configuration (PRDAC) algorithm for mobile ad hoc networks
NASA Astrophysics Data System (ADS)
Wu, Shaochuan; Tan, Xuezhi
2007-11-01
By analyzing all kinds of address configuration algorithms, this paper provides a new pseudo-random dynamic address configuration (PRDAC) algorithm for mobile ad hoc networks. Based on PRDAC, the first node that initials this network randomly chooses a nonlinear shift register that can generates an m-sequence. When another node joins this network, the initial node will act as an IP address configuration sever to compute an IP address according to this nonlinear shift register, and then allocates this address and tell the generator polynomial of this shift register to this new node. By this means, when other node joins this network, any node that has obtained an IP address can act as a server to allocate address to this new node. PRDAC can also efficiently avoid IP conflicts and deal with network partition and merge as same as prophet address (PA) allocation and dynamic configuration and distribution protocol (DCDP). Furthermore, PRDAC has less algorithm complexity, less computational complexity and more sufficient assumption than PA. In addition, PRDAC radically avoids address conflicts and maximizes the utilization rate of IP addresses. Analysis and simulation results show that PRDAC has rapid convergence, low overhead and immune from topological structures.
NASA Astrophysics Data System (ADS)
Cogoni, Marco; Busonera, Giovanni; Anedda, Paolo; Zanetti, Gianluigi
2015-01-01
We generalize previous studies on critical phenomena in communication networks [1,2] by adding computational capabilities to the nodes. In our model, a set of tasks with random origin, destination and computational structure is distributed on a computational network, modeled as a graph. By varying the temperature of a Metropolis Montecarlo, we explore the global latency for an optimal to suboptimal resource assignment at a given time instant. By computing the two-point correlation function for the local overload, we study the behavior of the correlation distance (both for links and nodes) while approaching the congested phase: a transition from peaked to spread g(r) is seen above a critical (Montecarlo) temperature Tc. The average latency trend of the system is predicted by averaging over several network traffic realizations while maintaining a spatially detailed information for each node: a sharp decrease of performance is found over Tc independently of the workload. The globally optimized computational resource allocation and network routing defines a baseline for a future comparison of the transition behavior with respect to existing routing strategies [3,4] for different network topologies.
NASA Astrophysics Data System (ADS)
Matsumoto, Monica M. S.; Beig, Niha G.; Udupa, Jayaram K.; Archer, Steven; Torigian, Drew A.
2014-03-01
Lung cancer is associated with the highest cancer mortality rates among men and women in the United States. The accurate and precise identification of the lymph node stations on computed tomography (CT) images is important for staging disease and potentially for prognosticating outcome in patients with lung cancer, as well as for pretreatment planning and response assessment purposes. To facilitate a standard means of referring to lymph nodes, the International Association for the Study of Lung Cancer (IASLC) has recently proposed a definition of the different lymph node stations and zones in the thorax. However, nodal station identification is typically performed manually by visual assessment in clinical radiology. This approach leaves room for error due to the subjective and potentially ambiguous nature of visual interpretation, and is labor intensive. We present a method of automatically recognizing the mediastinal IASLC-defined lymph node stations by modifying a hierarchical fuzzy modeling approach previously developed for body-wide automatic anatomy recognition (AAR) in medical imagery. Our AAR-lymph node (AAR-LN) system follows the AAR methodology and consists of two steps. In the first step, the various lymph node stations are manually delineated on a set of CT images following the IASLC definitions. These delineations are then used to build a fuzzy hierarchical model of the nodal stations which are considered as 3D objects. In the second step, the stations are automatically located on any given CT image of the thorax by using the hierarchical fuzzy model and object recognition algorithms. Based on 23 data sets used for model building, 22 independent data sets for testing, and 10 lymph node stations, a mean localization accuracy of within 1-6 voxels has been achieved by the AAR-LN system.
Design and Training of Limited-Interconnect Architectures
1991-07-16
and signal processing. Neuromorphic (brain like) models, allow an alternative for achieving real-time operation tor such tasks, while having a...compact and robust architecture. Neuromorphic models consist of interconnections of simple computational nodes. In this approach, each node computes a...operational performance. I1. Research Objectives The research objectives were: 1. Development of on- chip local training rules specifically designed for
Network support for system initiated checkpoints
Chen, Dong; Heidelberger, Philip
2013-01-29
A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
Donker, Mila; van Tienhoven, Geertjan; Straver, Marieke E; Meijnen, Philip; van de Velde, Cornelis J H; Mansel, Robert E; Cataliotti, Luigi; Westenberg, A Helen; Klinkenbijl, Jean H G; Orzalesi, Lorenzo; Bouma, Willem H; van der Mijle, Huub C J; Nieuwenhuijzen, Grard A P; Veltkamp, Sanne C; Slaets, Leen; Duez, Nicole J; de Graaf, Peter W; van Dalen, Thijs; Marinelli, Andreas; Rijna, Herman; Snoj, Marko; Bundred, Nigel J; Merkus, Jos W S; Belkacemi, Yazid; Petignat, Patrick; Schinagl, Dominic A X; Coens, Corneel; Messina, Carlo G M; Bogaerts, Jan; Rutgers, Emiel J T
2014-01-01
Summary Background If treatment of the axilla is indicated in patients with breast cancer who have a positive sentinel node, axillary lymph node dissection is the present standard. Although axillary lymph node dissection provides excellent regional control, it is associated with harmful side-effects. We aimed to assess whether axillary radiotherapy provides comparable regional control with fewer side-effects. Methods Patients with T1–2 primary breast cancer and no palpable lymphadenopathy were enrolled in the randomised, multicentre, open-label, phase 3 non-inferiority EORTC 10981-22023 AMAROS trial. Patients were randomly assigned (1:1) by a computer-generated allocation schedule to receive either axillary lymph node dissection or axillary radiotherapy in case of a positive sentinel node, stratified by institution. The primary endpoint was non-inferiority of 5-year axillary recurrence, considered to be not more than 4% for the axillary radiotherapy group compared with an expected 2% in the axillary lymph node dissection group. Analyses were by intention to treat and per protocol. The AMAROS trial is registered with ClinicalTrials.gov, number NCT00014612. Findings Between Feb 19, 2001, and April 29, 2010, 4823 patients were enrolled at 34 centres from nine European countries, of whom 4806 were eligible for randomisation. 2402 patients were randomly assigned to receive axillary lymph node dissection and 2404 to receive axillary radiotherapy. Of the 1425 patients with a positive sentinel node, 744 had been randomly assigned to axillary lymph node dissection and 681 to axillary radiotherapy; these patients constituted the intention-to-treat population. Median follow-up was 6·1 years (IQR 4·1–8·0) for the patients with positive sentinel lymph nodes. In the axillary lymph node dissection group, 220 (33%) of 672 patients who underwent axillary lymph node dissection had additional positive nodes. Axillary recurrence occurred in four of 744 patients in the axillary lymph node dissection group and seven of 681 in the axillary radiotherapy group. 5-year axillary recurrence was 0·43% (95% CI 0·00–0·92) after axillary lymph node dissection versus 1·19% (0·31–2·08) after axillary radiotherapy. The planned non-inferiority test was underpowered because of the low number of events. The one-sided 95% CI for the underpowered non-inferiority test on the hazard ratio was 0·00–5·27, with a non-inferiority margin of 2. Lymphoedema in the ipsilateral arm was noted significantly more often after axillary lymph node dissection than after axillary radiotherapy at 1 year, 3 years, and 5 years. Interpretation Axillary lymph node dissection and axillary radiotherapy after a positive sentinel node provide excellent and comparable axillary control for patients with T1–2 primary breast cancer and no palpable lymphadenopathy. Axillary radiotherapy results in significantly less morbidity. Funding EORTC Charitable Trust. PMID:25439688
Sensor Systems Based on FPGAs and Their Applications: A Survey
de la Piedra, Antonio; Braeken, An; Touhafi, Abdellah
2012-01-01
In this manuscript, we present a survey of designs and implementations of research sensor nodes that rely on FPGAs, either based upon standalone platforms or as a combination of microcontroller and FPGA. Several current challenges in sensor networks are distinguished and linked to the features of modern FPGAs. As it turns out, low-power optimized FPGAs are able to enhance the computation of several types of algorithms in terms of speed and power consumption in comparison to microcontrollers of commercial sensor nodes. We show that architectures based on the combination of microcontrollers and FPGA can play a key role in the future of sensor networks, in fields where processing capabilities such as strong cryptography, self-testing and data compression, among others, are paramount.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M
2013-01-01
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
Parallel scalability of Hartree-Fock calculations
NASA Astrophysics Data System (ADS)
Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.
2015-03-01
Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
Hsieh, Fushing; Hsueh, Chih-Hsin; Heitkamp, Constantin; Matthews, Mark
2016-01-01
Multiple datasets of two consecutive vintages of replicated grape and wines from six different deficit irrigation regimes are characterized and compared. The process consists of four temporal-ordered signature phases: harvest field data, juice composition, wine composition before bottling and bottled wine. A new computing paradigm and an integrative inferential platform are developed for discovering phase-to-phase pattern geometries for such characterization and comparison purposes. Each phase is manifested by a distinct set of features, which are measurable upon phase-specific entities subject to the common set of irrigation regimes. Throughout the four phases, this compilation of data from irrigation regimes with subsamples is termed a space of media-nodes, on which measurements of phase-specific features were recoded. All of these collectively constitute a bipartite network of data, which is then normalized and binary coded. For these serial bipartite networks, we first quantify patterns that characterize individual phases by means of a new computing paradigm called "Data Mechanics". This computational technique extracts a coupling geometry which captures and reveals interacting dependence among and between media-nodes and feature-nodes in forms of hierarchical block sub-matrices. As one of the principal discoveries, the holistic year-factor persistently surfaces as the most inferential factor in classifying all media-nodes throughout all phases. This could be deemed either surprising in its over-arching dominance or obvious based on popular belief. We formulate and test pattern-based hypotheses that confirm such fundamental patterns. We also attempt to elucidate the driving force underlying the phase-evolution in winemaking via a newly developed partial coupling geometry, which is designed to integrate two coupling geometries. Such partial coupling geometries are confirmed to bear causal and predictive implications. All pattern inferences are performed with respect to a profile of energy distributions sampled from network bootstrapping ensembles conforming to block-structures specified by corresponding hypotheses.
Hsieh, Fushing; Hsueh, Chih-Hsin; Heitkamp, Constantin; Matthews, Mark
2016-01-01
Multiple datasets of two consecutive vintages of replicated grape and wines from six different deficit irrigation regimes are characterized and compared. The process consists of four temporal-ordered signature phases: harvest field data, juice composition, wine composition before bottling and bottled wine. A new computing paradigm and an integrative inferential platform are developed for discovering phase-to-phase pattern geometries for such characterization and comparison purposes. Each phase is manifested by a distinct set of features, which are measurable upon phase-specific entities subject to the common set of irrigation regimes. Throughout the four phases, this compilation of data from irrigation regimes with subsamples is termed a space of media-nodes, on which measurements of phase-specific features were recoded. All of these collectively constitute a bipartite network of data, which is then normalized and binary coded. For these serial bipartite networks, we first quantify patterns that characterize individual phases by means of a new computing paradigm called “Data Mechanics”. This computational technique extracts a coupling geometry which captures and reveals interacting dependence among and between media-nodes and feature-nodes in forms of hierarchical block sub-matrices. As one of the principal discoveries, the holistic year-factor persistently surfaces as the most inferential factor in classifying all media-nodes throughout all phases. This could be deemed either surprising in its over-arching dominance or obvious based on popular belief. We formulate and test pattern-based hypotheses that confirm such fundamental patterns. We also attempt to elucidate the driving force underlying the phase-evolution in winemaking via a newly developed partial coupling geometry, which is designed to integrate two coupling geometries. Such partial coupling geometries are confirmed to bear causal and predictive implications. All pattern inferences are performed with respect to a profile of energy distributions sampled from network bootstrapping ensembles conforming to block-structures specified by corresponding hypotheses. PMID:27508416
Using LAMMPS Software on the Peregrine System | High-Performance Computing
-l walltime=4:00:00 # WALLTIME #PBS -l nodes=2:ppn=16 # Number of nodes and processes per node #PBS module purge module load impi-intel/2017.0.5 mkl/2017.0.5 lammps/11Aug17 mpirun -np 32 lmp -in lmp.in -l
Use of Networked Collaborative Concept Mapping To Measure Team Processes and Team Outcomes.
ERIC Educational Resources Information Center
Chung, Gregory K. W. K.; O'Neil, Harold F., Jr.; Herl, Howard E.; Dennis, Robert A.
The feasibility of using a computer-based networked collaborative concept mapping system to measure teamwork skills was studied. A concept map is a node-link-node representation of content, where the nodes represent concepts and links represent relationships between connected concepts. Teamwork processes were examined for a group concept mapping…
NASA Technical Reports Server (NTRS)
Halyo, Nesim; Choi, Sang H.; Chrisman, Dan A., Jr.; Samms, Richard W.
1987-01-01
Dynamic models and computer simulations were developed for the radiometric sensors utilized in the Earth Radiation Budget Experiment (ERBE). The models were developed to understand performance, improve measurement accuracy by updating model parameters and provide the constants needed for the count conversion algorithms. Model simulations were compared with the sensor's actual responses demonstrated in the ground and inflight calibrations. The models consider thermal and radiative exchange effects, surface specularity, spectral dependence of a filter, radiative interactions among an enclosure's nodes, partial specular and diffuse enclosure surface characteristics and steady-state and transient sensor responses. Relatively few sensor nodes were chosen for the models since there is an accuracy tradeoff between increasing the number of nodes and approximating parameters such as the sensor's size, material properties, geometry, and enclosure surface characteristics. Given that the temperature gradients within a node and between nodes are small enough, approximating with only a few nodes does not jeopardize the accuracy required to perform the parameter estimates and error analyses.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
2009-06-30
ISS020-E-016151 (30 June 2009) --- Japan Aerospace Exploration Agency (JAXA) astronaut Koichi Wakata, Expedition 20 flight engineer, enters data in a computer in the Harmony node of the International Space Station.
Semi-automatic central-chest lymph-node definition from 3D MDCT images
NASA Astrophysics Data System (ADS)
Lu, Kongkuo; Higgins, William E.
2010-03-01
Central-chest lymph nodes play a vital role in lung-cancer staging. The three-dimensional (3D) definition of lymph nodes from multidetector computed-tomography (MDCT) images, however, remains an open problem. This is because of the limitations in the MDCT imaging of soft-tissue structures and the complicated phenomena that influence the appearance of a lymph node in an MDCT image. In the past, we have made significant efforts toward developing (1) live-wire-based segmentation methods for defining 2D and 3D chest structures and (2) a computer-based system for automatic definition and interactive visualization of the Mountain central-chest lymph-node stations. Based on these works, we propose new single-click and single-section live-wire methods for segmenting central-chest lymph nodes. The single-click live wire only requires the user to select an object pixel on one 2D MDCT section and is designed for typical lymph nodes. The single-section live wire requires the user to process one selected 2D section using standard 2D live wire, but it is more robust. We applied these methods to the segmentation of 20 lymph nodes from two human MDCT chest scans (10 per scan) drawn from our ground-truth database. The single-click live wire segmented 75% of the selected nodes successfully and reproducibly, while the success rate for the single-section live wire was 85%. We are able to segment the remaining nodes, using our previously derived (but more interaction intense) 2D live-wire method incorporated in our lymph-node analysis system. Both proposed methods are reliable and applicable to a wide range of pulmonary lymph nodes.
Scalable asynchronous execution of cellular automata
NASA Astrophysics Data System (ADS)
Folino, Gianluigi; Giordano, Andrea; Mastroianni, Carlo
2016-10-01
The performance and scalability of cellular automata, when executed on parallel/distributed machines, are limited by the necessity of synchronizing all the nodes at each time step, i.e., a node can execute only after the execution of the previous step at all the other nodes. However, these synchronization requirements can be relaxed: a node can execute one step after synchronizing only with the adjacent nodes. In this fashion, different nodes can execute different time steps. This can be a notable advantageous in many novel and increasingly popular applications of cellular automata, such as smart city applications, simulation of natural phenomena, etc., in which the execution times can be different and variable, due to the heterogeneity of machines and/or data and/or executed functions. Indeed, a longer execution time at a node does not slow down the execution at all the other nodes but only at the neighboring nodes. This is particularly advantageous when the nodes that act as bottlenecks vary during the application execution. The goal of the paper is to analyze the benefits that can be achieved with the described asynchronous implementation of cellular automata, when compared to the classical all-to-all synchronization pattern. The performance and scalability have been evaluated through a Petri net model, as this model is very useful to represent the synchronization barrier among nodes. We examined the usual case in which the territory is partitioned into a number of regions, and the computation associated with a region is assigned to a computing node. We considered both the cases of mono-dimensional and two-dimensional partitioning. The results show that the advantage obtained through the asynchronous execution, when compared to the all-to-all synchronous approach is notable, and it can be as large as 90% in terms of speedup.
Bennie, George; Vorster, Mariza; Buscombe, John; Sathekge, Mike
2015-01-01
Single-photon emission computed tomography-computed tomography (SPECT-CT) allows for physiological and anatomical co-registration in sentinel lymph node (SLN) mapping and offers additional benefits over conventional planar imaging. However, the clinical relevance when considering added costs and radiation burden of these reported benefits remains somewhat uncertain. This study aimed to evaluate the possible added value of SPECT-CT and intra-operative gamma-probe use over planar imaging alone in the South African setting. 80 patients with breast cancer or malignant melanoma underwent both planar and SPECT-CT imaging for SLN mapping. We assessed and compared the number of nodes detected on each study, false positive and negative findings, changes in surgical approach and or patient management. In all cases where a sentinel node was identified, SPECT-CT was more accurate anatomically. There was a significant change in surgical approach in 30 cases - breast cancer (n = 13; P 0.001) and malignant melanoma (n = 17; P 0.0002). In 4 cases a node not identified on planar imaging was seen on SPECT-CT. In 16 cases additional echelon nodes were identified. False positives were excluded by SPECT-CT in 12 cases. The addition of SPECT-CT and use of intra-operative gamma-probe to planar imaging offers important benefits in patients who present with breast cancer and melanoma. These benefits include increased nodal detection, elimination of false positives and negatives and improved anatomical localization that ultimately aids and expedites surgical management. This has been demonstrated in the context of industrialized country previously and has now also been confirmed in the setting of a emerging-market nation.
Solving global shallow water equations on heterogeneous supercomputers
Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen
2017-01-01
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved. PMID:28282428
Space physics analysis network node directory (The Yellow Pages): Fourth edition
NASA Technical Reports Server (NTRS)
Peters, David J.; Sisson, Patricia L.; Green, James L.; Thomas, Valerie L.
1989-01-01
The Space Physics Analysis Network (SPAN) is a component of the global DECnet Internet, which has over 17,000 host computers. The growth of SPAN from its implementation in 1981 to its present size of well over 2,500 registered SPAN host computers, has created a need for users to acquire timely information about the network through a central source. The SPAN Network Information Center (SPAN-NIC) an online facility managed by the National Space Science Data Center (NSSDC) was developed to meet this need for SPAN-wide information. The remote node descriptive information in this document is not currently contained in the SPAN-NIC database, but will be incorporated in the near future. Access to this information is also available to non-DECnet users over a variety of networks such as Telenet, the NASA Packet Switched System (NPSS), and the TCP/IP Internet. This publication serves as the Yellow Pages for SPAN node information. The document also provides key information concerning other computer networks connected to SPAN, nodes associated with each SPAN routing center, science discipline nodes, contacts for primary SPAN nodes, and SPAN reference information. A section on DECnet Internetworking discusses SPAN connections with other wide-area DECnet networks (many with thousands of nodes each). Another section lists node names and their disciplines, countries, and institutions in the SPAN Network Information Center Online Data Base System. All remote sites connected to US-SPAN and European-SPAN (E-SPAN) are indexed. Also provided is information on the SPAN tail circuits, i.e., those remote nodes connected directly to a SPAN routing center, which is the local point of contact for resolving SPAN-related problems. Reference material is included for those who wish to know more about SPAN. Because of the rapid growth of SPAN, the SPAN Yellow Pages is reissued periodically.
ISS Node-1 and PMA-1 rotated in KSC's SSPF
NASA Technical Reports Server (NTRS)
1997-01-01
The International Space Station's Node 1 and Pressurized Mating Adapter-1 (PMA-1) are rotated by workers in KSC's Space Station Processing Facility. The node is rotated to provide access to different areas of the flight element for processing. Here, the node is rotated to provide access for the installation of heat pipe radiators and a flight computer. The node is scheduled to launch into space on STS-88, slated for a July 9 liftoff at 1:11 p.m. from KSC's Launch Pad 39B.
NASA Astrophysics Data System (ADS)
Reichenberger, Michael A.; Nichols, Daniel M.; Stevenson, Sarah R.; Swope, Tanner M.; Hilger, Caden W.; Roberts, Jeremy A.; Unruh, Troy C.; McGregor, Douglas S.
2018-01-01
Advancements in nuclear reactor core modeling and computational capability have encouraged further development of in-core neutron sensors. Measurement of the neutron-flux distribution within the reactor core provides a more complete understanding of the operating conditions in the reactor than typical ex-core sensors. Micro-Pocket Fission Detectors have been developed and tested previously but have been limited to single-node operation and have utilized highly specialized designs. The development of a widely deployable, multi-node Micro-Pocket Fission Detector assembly will enhance nuclear research capabilities. A modular, four-node Micro-Pocket Fission Detector array was designed, fabricated, and tested at Kansas State University. The array was constructed from materials that do not significantly perturb the neutron flux in the reactor core. All four sensor nodes were equally spaced axially in the array to span the fuel-region of the reactor core. The array was filled with neon gas, serving as an ionization medium in the small cavities of the Micro-Pocket Fission Detectors. The modular design of the instrument facilitates the testing and deployment of numerous sensor arrays. The unified design drastically improved device ruggedness and simplified construction from previous designs. Five 8-mm penetrations in the upper grid plate of the Kansas State University TRIGA Mk. II research nuclear reactor were utilized to deploy the array between fuel elements in the core. The Micro-Pocket Fission Detector array was coupled to an electronic support system which has been specially developed to support pulse-mode operation. The Micro-Pocket Fission Detector array composed of four sensors was used to monitor local neutron flux at a constant reactor power of 100 kWth at different axial locations simultaneously. The array was positioned at five different radial locations within the core to emulate the deployment of multiple arrays and develop a 2-dimensional measurement of neutron flux in the reactor core.
González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J
2017-02-01
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
ClueNet: Clustering a temporal network based on topological similarity rather than denseness.
Crawford, Joseph; Milenković, Tijana
2018-01-01
Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
A Scalable proxy cache for Grid Data Access
NASA Astrophysics Data System (ADS)
Cristian Cirstea, Traian; Just Keijser, Jan; Koeroo, Oscar Arthur; Starink, Ronald; Templon, Jeffrey Alan
2012-12-01
We describe a prototype grid proxy cache system developed at Nikhef, motivated by a desire to construct the first building block of a future https-based Content Delivery Network for grid infrastructures. Two goals drove the project: firstly to provide a “native view” of the grid for desktop-type users, and secondly to improve performance for physics-analysis type use cases, where multiple passes are made over the same set of data (residing on the grid). We further constrained the design by requiring that the system should be made of standard components wherever possible. The prototype that emerged from this exercise is a horizontally-scalable, cooperating system of web server / cache nodes, fronted by a customized webDAV server. The webDAV server is custom only in the sense that it supports http redirects (providing horizontal scaling) and that the authentication module has, as back end, a proxy delegation chain that can be used by the cache nodes to retrieve files from the grid. The prototype was deployed at Nikhef and tested at a scale of several terabytes of data and approximately one hundred fast cores of computing. Both small and large files were tested, in a number of scenarios, and with various numbers of cache nodes, in order to understand the scaling properties of the system. For properly-dimensioned cache-node hardware, the system showed speedup of several integer factors for the analysis-type use cases. These results and others are presented and discussed.
Running Interactive Jobs on Peregrine | High-Performance Computing | NREL
The qsub -I command is used to start an interactive session on one or more compute nodes. When . You will see a message such as qsub : waiting for job 12090.admin1 to start When it has, you'll see a exports your environment variables to the interactive job. Type exit when finished using the node. Like
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |
parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
GPU accelerated implementation of NCI calculations using promolecular density.
Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric
2017-05-30
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Transient thermal analysis of fluid systems
NASA Technical Reports Server (NTRS)
Chandler, G. D.; Trust, R. D.
1977-01-01
Computer program performs transient thermal analysis of any 2-node to 200-node-thermal network, which transports heat by fluid flow convection. Program can be modified to add conduction along tubes and radiation.
Implementing asyncronous collective operations in a multi-node processing system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Dong; Eisley, Noel A.; Heidelberger, Philip
A method, system, and computer program product are disclosed for implementing an asynchronous collective operation in a multi-node data processing system. In one embodiment, the method comprises sending data to a plurality of nodes in the data processing system, broadcasting a remote get to the plurality of nodes, and using this remote get to implement asynchronous collective operations on the data by the plurality of nodes. In one embodiment, each of the nodes performs only one task in the asynchronous operations, and each nodes sets up a base address table with an entry for a base address of a memorymore » buffer associated with said each node. In another embodiment, each of the nodes performs a plurality of tasks in said collective operations, and each task of each node sets up a base address table with an entry for a base address of a memory buffer associated with the task.« less
Near real-time traffic routing
NASA Technical Reports Server (NTRS)
Yang, Chaowei (Inventor); Xie, Jibo (Inventor); Zhou, Bin (Inventor); Cao, Ying (Inventor)
2012-01-01
A near real-time physical transportation network routing system comprising: a traffic simulation computing grid and a dynamic traffic routing service computing grid. The traffic simulator produces traffic network travel time predictions for a physical transportation network using a traffic simulation model and common input data. The physical transportation network is divided into a multiple sections. Each section has a primary zone and a buffer zone. The traffic simulation computing grid includes multiple of traffic simulation computing nodes. The common input data includes static network characteristics, an origin-destination data table, dynamic traffic information data and historical traffic data. The dynamic traffic routing service computing grid includes multiple dynamic traffic routing computing nodes and generates traffic route(s) using the traffic network travel time predictions.
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI
Donato, David I.
2017-01-01
In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
THEORY OF SOLAR MERIDIONAL CIRCULATION AT HIGH LATITUDES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dikpati, Mausumi; Gilman, Peter A., E-mail: dikpati@ucar.edu, E-mail: gilman@ucar.edu
2012-02-10
We build a hydrodynamic model for computing and understanding the Sun's large-scale high-latitude flows, including Coriolis forces, turbulent diffusion of momentum, and gyroscopic pumping. Side boundaries of the spherical 'polar cap', our computational domain, are located at latitudes {>=} 60 Degree-Sign . Implementing observed low-latitude flows as side boundary conditions, we solve the flow equations for a Cartesian analog of the polar cap. The key parameter that determines whether there are nodes in the high-latitude meridional flow is {epsilon} = 2{Omega}n{pi}H{sup 2}/{nu}, where {Omega} is the interior rotation rate, n is the radial wavenumber of the meridional flow, H ismore » the depth of the convection zone, and {nu} is the turbulent viscosity. The smaller the {epsilon} (larger turbulent viscosity), the fewer the number of nodes in high latitudes. For all latitudes within the polar cap, we find three nodes for {nu} = 10{sup 12} cm{sup 2} s{sup -1}, two for 10{sup 13}, and one or none for 10{sup 15} or higher. For {nu} near 10{sup 14} our model exhibits 'node merging': as the meridional flow speed is increased, two nodes cancel each other, leaving no nodes. On the other hand, for fixed flow speed at the boundary, as {nu} is increased the poleward-most node migrates to the pole and disappears, ultimately for high enough {nu} leaving no nodes. These results suggest that primary poleward surface meridional flow can extend from 60 Degree-Sign to the pole either by node merging or by node migration and disappearance.« less
Ryazanskiy and Nyberg in Node 1
2013-10-02
ISS037-E-005750 (2 Oct. 2013) --- NASA astronaut Karen Nyberg and Russian cosmonaut Sergey Ryazanskiy, both Expedition 37 flight engineers, look at a computer monitor in the Unity node of the International Space Station.
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey
2010-10-05
In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
NASA Astrophysics Data System (ADS)
van Heijst, Tristan C. F.; Eschbach-Zandbergen, Debora; Hoekstra, Nienke; van Asselen, Bram; Lagendijk, Jan J. W.; Verkooijen, Helena M.; Pijnappel, Ruud M.; de Waard, Stephanie N.; Witkamp, Arjen J.; van Dalen, Thijs; Desirée van den Bongard, H. J. G.; Philippens, Marielle E. P.
2017-08-01
Regional radiotherapy (RT) is increasingly used in breast cancer treatment. Conventionally, computed tomography (CT) is performed for RT planning. Lymph node (LN) target levels are delineated according to anatomical boundaries. Magnetic resonance imaging (MRI) could enable individual LN delineation. The purpose was to evaluate the applicability of MRI for LN detection in supine treatment position, before and after sentinel-node biopsy (SNB). Twenty-three female breast cancer patients (cTis-3N0M0) underwent 1.5 T MRI, before and after SNB, in addition to CT. Endurance for MRI was monitored. Axillary levels were delineated. LNs were identified and delineated on MRI from before and after SNB, and on CT, and compared by Wilcoxon signed-rank tests. LN locations and LN-based volumes were related to axillary delineations and associated volumes. Although postoperative effects were visible, LN numbers on postoperative MRI (median 26 LNs) were highly reproducible compared to preoperative MRI when adding excised sentinel nodes, and higher than on CT (median 11, p < 0.001). LN-based volumes were considerably smaller than respective axillary levels. Supine MRI of LNs is feasible and reproducible before and after SNB. This may lead to more accurate RT target definition compared to CT, with potentially lower toxicity. With the MRI techniques described here, initiation of novel MRI-guided RT strategies aiming at individual LNs could be possible.
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Evangelista, Laura; Guttilla, Andrea; Zattoni, Fabio; Muzzio, Pier Carlo; Zattoni, Filiberto
2013-06-01
Determination of tumour involvement of regional lymph nodes in patients with prostate cancer (PCa) is of key importance for the proper planning of treatment. To provide a critical overview of published reports and to perform a meta-analysis about the diagnostic performance of 18F-choline and 11C-choline positron emission tomography (PET) or PET/computed tomography (CT) in the lymph node staging of PCa. A Medline, Web of Knowledge, and Google Scholar search was carried out to select English-language articles published before January 2012 that discussed the diagnostic performance of choline PET to individualise lymph node disease at initial staging in PCa patients. Articles were included only if absolute numbers of true-positive, true-negative, false-positive, and false-negative test results were available or derivable from the text and focused on lymph node metastases. Reviews, clinical reports, and editorial articles were excluded. All complete studies were reviewed; thus qualitative and quantitative analyses were performed. From the year 2000 to January 2012, we found 18 complete articles that critically evaluated the role of choline PET and PCa at initial staging. The meta-analysis was carried out and consisted of 10 selected studies with a total of 441 patients. The meta-analysis provided the following results: pooled sensitivity 49.2% (95% confidence interval [CI], 39.9-58.4) and pooled specificity 95% (95% CI, 92-97.1). The area under the curve was 0.9446 (p<0.05). The heterogeneity ranged between 22.7% and 78.4%. The diagnostic odds ratio was 18.999 (95% CI, 7.109-50.773). Choline PET and PET/CT provide low sensitivity in the detection of lymph node metastases prior to surgery in PCa patients. A high specificity has been reported from the overall studies. Studies carried out on a larger scale with a homogeneous patient population together with the evaluation of cost effectiveness are warranted. Copyright © 2012 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Method for nonlinear optimization for gas tagging and other systems
Chen, Ting; Gross, Kenny C.; Wegerich, Stephan
1998-01-01
A method and system for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established.
Method for nonlinear optimization for gas tagging and other systems
Chen, T.; Gross, K.C.; Wegerich, S.
1998-01-06
A method and system are disclosed for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established. 6 figs.
Decision-Tree Formulation With Order-1 Lateral Execution
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive-disjunctive paths to a flattened conjunctive form composed only of equality checks when possible. If each reduced conjunctive form contains only equality checks and all of these forms use the same variables, then the decision tree can be reduced to an order-one operation through a table lookup. The speedup to order one is accomplished by distributing each decision variable over a surface of a multidimensional object by mapping the equality constant to an index
Proactive Fault Tolerance Using Preemptive Migration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Vallee, Geoffroy R; Naughton, III, Thomas J
2009-01-01
Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail. This paper provides a foundation for proactive FT by defining its architecture and classifying implementation options. This paper further relates prior work to the presented architecture and classification, and discusses the challenges ahead for needed supporting technologies.
2013-06-03
and a C++ computational backend . The most current version of ORA (3.0.8.5) software is available on the casos website: http://casos.cs.cmu.edu...optimizing a network’s design structure. ORA uses a Java interface for ease of use, and a C++ computational backend . The most current version of ORA...Eigenvector Centrality : Node most connected to other highly connected nodes. Assists in identifying those who can mobilize others Entity Class
Method and system for dynamic probabilistic risk assessment
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta (Inventor); Xu, Hong (Inventor)
2013-01-01
The DEFT methodology, system and computer readable medium extends the applicability of the PRA (Probabilistic Risk Assessment) methodology to computer-based systems, by allowing DFT (Dynamic Fault Tree) nodes as pivot nodes in the Event Tree (ET) model. DEFT includes a mathematical model and solution algorithm, supports all common PRA analysis functions and cutsets. Additional capabilities enabled by the DFT include modularization, phased mission analysis, sequence dependencies, and imperfect coverage.
GASNet-EX Performance Improvements Due to Specialization for the Cray Aries Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hargrove, Paul H.; Bonachea, Dan
This document is a deliverable for milestone STPM17-6 of the Exascale Computing Project, delivered by WBS 2.3.1.14. It reports on the improvements in performance observed on Cray XC-series systems due to enhancements made to the GASNet-EX software. These enhancements, known as “specializations”, primarily consist of replacing network-independent implementations of several recently added features with implementations tailored to the Cray Aries network. Performance gains from specialization include (1) Negotiated-Payload Active Messages improve bandwidth of a ping-pong test by up to 14%, (2) Immediate Operations reduce running time of a synthetic benchmark by up to 93%, (3) non-bulk RMA Put bandwidth ismore » increased by up to 32%, (4) Remote Atomic performance is 70% faster than the reference on a point-to-point test and allows a hot-spot test to scale robustly, and (5) non-contiguous RMA interfaces see up to 8.6x speedups for an intra-node benchmark and 26% for inter-node. These improvements are available in the GASNet-EX 2018.3.0 release.« less
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
NASA Astrophysics Data System (ADS)
Huhn, William Paul; Lange, Björn; Yu, Victor; Blum, Volker; Lee, Seyong; Yoon, Mina
Density-functional theory has been well established as the dominant quantum-mechanical computational method in the materials community. Large accurate simulations become very challenging on small to mid-scale computers and require high-performance compute platforms to succeed. GPU acceleration is one promising approach. In this talk, we present a first implementation of all-electron density-functional theory in the FHI-aims code for massively parallel GPU-based platforms. Special attention is paid to the update of the density and to the integration of the Hamiltonian and overlap matrices, realized in a domain decomposition scheme on non-uniform grids. The initial implementation scales well across nodes on ORNL's Titan Cray XK7 supercomputer (8 to 64 nodes, 16 MPI ranks/node) and shows an overall speed up in runtime due to utilization of the K20X Tesla GPUs on each Titan node of 1.4x, with the charge density update showing a speed up of 2x. Further acceleration opportunities will be discussed. Work supported by the LDRD Program of ORNL managed by UT-Battle, LLC, for the U.S. DOE and by the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Integration of PanDA workload management system with Titan supercomputer at OLCF
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, the future LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). The current approach utilizes a modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multicore worker nodes. It also gives PanDA new capability to collect, in real time, information about unused worker nodes on Titan, which allows precise definition of the size and duration of jobs submitted to Titan according to available free resources. This capability significantly reduces PanDA job wait time while improving Titan's utilization efficiency. This implementation was tested with a variety of Monte-Carlo workloads on Titan and is being tested on several other supercomputing platforms. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
LEGION: Lightweight Expandable Group of Independently Operating Nodes
NASA Technical Reports Server (NTRS)
Burl, Michael C.
2012-01-01
LEGION is a lightweight C-language software library that enables distributed asynchronous data processing with a loosely coupled set of compute nodes. Loosely coupled means that a node can offer itself in service to a larger task at any time and can withdraw itself from service at any time, provided it is not actively engaged in an assignment. The main program, i.e., the one attempting to solve the larger task, does not need to know up front which nodes will be available, how many nodes will be available, or at what times the nodes will be available, which is normally the case in a "volunteer computing" framework. The LEGION software accomplishes its goals by providing message-based, inter-process communication similar to MPI (message passing interface), but without the tight coupling requirements. The software is lightweight and easy to install as it is written in standard C with no exotic library dependencies. LEGION has been demonstrated in a challenging planetary science application in which a machine learning system is used in closed-loop fashion to efficiently explore the input parameter space of a complex numerical simulation. The machine learning system decides which jobs to run through the simulator; then, through LEGION calls, the system farms those jobs out to a collection of compute nodes, retrieves the job results as they become available, and updates a predictive model of how the simulator maps inputs to outputs. The machine learning system decides which new set of jobs would be most informative to run given the results so far; this basic loop is repeated until sufficient insight into the physical system modeled by the simulator is obtained.
Parallel computer processing and modeling: applications for the ICU
NASA Astrophysics Data System (ADS)
Baxter, Grant; Pranger, L. Alex; Draghic, Nicole; Sims, Nathaniel M.; Wiesmann, William P.
2003-07-01
Current patient monitoring procedures in hospital intensive care units (ICUs) generate vast quantities of medical data, much of which is considered extemporaneous and not evaluated. Although sophisticated monitors to analyze individual types of patient data are routinely used in the hospital setting, this equipment lacks high order signal analysis tools for detecting long-term trends and correlations between different signals within a patient data set. Without the ability to continuously analyze disjoint sets of patient data, it is difficult to detect slow-forming complications. As a result, the early onset of conditions such as pneumonia or sepsis may not be apparent until the advanced stages. We report here on the development of a distributed software architecture test bed and software medical models to analyze both asynchronous and continuous patient data in real time. Hardware and software has been developed to support a multi-node distributed computer cluster capable of amassing data from multiple patient monitors and projecting near and long-term outcomes based upon the application of physiologic models to the incoming patient data stream. One computer acts as a central coordinating node; additional computers accommodate processing needs. A simple, non-clinical model for sepsis detection was implemented on the system for demonstration purposes. This work shows exceptional promise as a highly effective means to rapidly predict and thereby mitigate the effect of nosocomial infections.
2009-10-05
ISS020-E-045314 (5 Oct. 2009) --- European Space Agency astronaut Frank De Winne, Expedition 20 flight engineer and Expedition 21 commander, uses a communication system near a computer in the Harmony node of the International Space Station.
Cell boundary fault detection system
Archer, Charles Jens [Rochester, MN; Pinnow, Kurt Walter [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian Edward [Rochester, MN
2009-05-05
A method determines a nodal fault along the boundary, or face, of a computing cell. Nodes on adjacent cell boundaries communicate with each other, and the communications are analyzed to determine if a node or connection is faulty.
2013-10-04
ISS037-E-006528 (4 Oct. 2013) --- European Space Agency astronaut Luca Parmitano, Expedition 37 flight engineer, holds a light fixture as he enters data into a computer in the Harmony node of the International Space Station.
NASA Astrophysics Data System (ADS)
Majumdar, Alok; Valenzuela, Juan; LeClair, Andre; Moder, Jeff
2016-03-01
This paper presents a numerical model of a system-level test bed-the multipurpose hydrogen test bed (MHTB) using the Generalized Fluid System Simulation Program (GFSSP). MHTB is representative in size and shape of a space transportation vehicle liquid hydrogen propellant tank, and ground-based testing was performed at NASA Marshall Space Flight Center (MSFC) to generate data for cryogenic storage. GFSSP is a finite volume-based network flow analysis software developed at MSFC and used for thermofluid analysis of propulsion systems. GFSSP has been used to model the self-pressurization and ullage pressure control by the Thermodynamic Vent System (TVS). A TVS typically includes a Joule-Thompson (J-T) expansion device, a two-phase heat exchanger (HEX), and a mixing pump and liquid injector to extract thermal energy from the tank without significant loss of liquid propellant. For the MHTB tank, the HEX and liquid injector are combined into a vertical spray bar assembly. Two GFSSP models (Self-Pressurization and TVS) were separately developed and tested and then integrated to simulate the entire system. The Self-Pressurization model consists of multiple ullage nodes, a propellant node, and solid nodes; it computes the heat transfer through multilayer insulation blankets and calculates heat and mass transfer between the ullage and liquid propellant and the ullage and tank wall. A TVS model calculates the flow through a J-T valve, HEX, and spray and vent systems. Two models are integrated by exchanging data through User Subroutines of both models. Results of the integrated models have been compared with MHTB test data at a 50% fill level. Satisfactory comparison was observed between tests and numerical predictions.
Napolitano, Jr., Leonard M.
1995-01-01
The Lambda network is a single stage, packet-switched interprocessor communication network for a distributed memory, parallel processor computer. Its design arises from the desired network characteristics of minimizing mean and maximum packet transfer time, local routing, expandability, deadlock avoidance, and fault tolerance. The network is based on fixed degree nodes and has mean and maximum packet transfer distances where n is the number of processors. The routing method is detailed, as are methods for expandability, deadlock avoidance, and fault tolerance.
Blom, R L G M; Vliegen, R F A; Schreurs, W M J; Belgers, H J; Stohr, I; Oostenbrug, L E; Sosef, M N
2012-08-01
One of the objectives of preoperative imaging in esophageal cancer patients is the detection of cervical lymph node metastases. Traditionally, external ultrasonography of the neck has been combined with computed tomography (CT) in order to improve the detection of cervical metastases. In general, integrated positron emission tomography-computed tomography (PET-CT) has been shown to be superior to CT or PET regarding staging and therefore may limit the role of external ultrasonography of the neck. The objective of this study was to determine the additional value of external ultrasonography of the neck to PET-CT. This study included all patients referred our center for treatment of esophageal carcinoma. Diagnostic staging was performed to determine treatment plan. Cervical lymph nodes were evaluated by external ultrasonography of the neck and PET-CT. In case of suspect lymph nodes on external ultrasonography or PET-CT, fine needle aspiration (FNA) was performed. Between 2008 and 2010, 170 out of 195 referred patients underwent both external ultrasonography of the neck and PET-CT. Of all patients, 84% were diagnosed with a tumor at or below the distal esophagus. In 140 of 170 patients, the cervical region was not suspect; no FNA was performed. Seven out of 170 patients had suspect nodes on both PET-CT and external ultrasonography. Five out of seven patients had cytologically confirmed malignant lymph nodes, one of seven had benign nodes, in one patient FNA was not performed; exclusion from esophagectomy was based on intra-abdominal metastases. In one out of 170 patients, PET-CT showed suspect nodes combined with a negative external ultrasonography; cytology of these nodes was benign. Twenty-two out of 170 patients had a negative PET-CT with suspect nodes on external ultrasonography. In 18 of 22 patients, cervical lymph nodes were cytologically confirmed benign; in four patients, FNA was not possible or inconclusive. At a median postoperative follow-up of 15 months, only 1% of patients developed cervical lymph node metastases. This study shows no additional value of external ultrasonography to a negative PET-CT. According to our results, it can be omitted in the primary workup. However, suspect lymph nodes on PET-CT should be confirmed by FNA to exclude false positives if it would change treatment plan. © 2011 Copyright the Authors. Journal compilation © 2011, Wiley Periodicals, Inc. and the International Society for Diseases of the Esophagus.
Approximating frustration scores in complex networks via perturbed Laplacian spectra
NASA Astrophysics Data System (ADS)
Savol, Andrej J.; Chennubhotla, Chakra S.
2015-12-01
Systems of many interacting components, as found in physics, biology, infrastructure, and the social sciences, are often modeled by simple networks of nodes and edges. The real-world systems frequently confront outside intervention or internal damage whose impact must be predicted or minimized, and such perturbations are then mimicked in the models by altering nodes or edges. This leads to the broad issue of how to best quantify changes in a model network after some type of perturbation. In the case of node removal there are many centrality metrics which associate a scalar quantity with the removed node, but it can be difficult to associate the quantities with some intuitive aspect of physical behavior in the network. This presents a serious hurdle to the application of network theory: real-world utility networks are rarely altered according to theoretic principles unless the kinetic impact on the network's users are fully appreciated beforehand. In pursuit of a kinetically interpretable centrality score, we discuss the f-score, or frustration score. Each f-score quantifies whether a selected node accelerates or inhibits global mean first passage times to a second, independently selected target node. We show that this is a natural way of revealing the dynamical importance of a node in some networks. After discussing merits of the f-score metric, we combine spectral and Laplacian matrix theory in order to quickly approximate the exact f-score values, which can otherwise be expensive to compute. Following tests on both synthetic and real medium-sized networks, we report f-score runtime improvements over exact brute force approaches in the range of 0 to 400 % with low error (<3 % ).
Vesselle, Hubert J.
2014-01-01
Purpose To evaluate the effect of adding lymph node size to three previously explored artificial neural network (ANN) input parameters (primary tumor maximum standardized uptake value or tumor uptake, tumor size, and nodal uptake at N1, N2, and N3 stations) in the structure of the ANN. The goal was to allow the resulting ANN structure to relate lymph node uptake for size to primary tumor uptake for size in the determination of the status of nodes as human readers do. Materials and Methods This prospective study was approved by the institutional review board, and informed consent was obtained from all participants. The authors developed a back-propagation ANN with one hidden layer and eight processing units. The data set used to train the network included node and tumor size and uptake from 133 patients with non–small cell lung cancer with surgically proved N status. Statistical analysis was performed with the paired t test. Results The ANN correctly predicted the N stage in 99.2% of cases, compared with 72.4% for the expert reader (P < .001). In categorization of N0 and N1 versus N2 and N3 disease, the ANN performed with 99.2% accuracy versus 92.2% for the expert reader (P < .001). Conclusion The ANN is 99.2% accurate in predicting surgical-pathologic nodal status with use of four fluorine 18 fluorodeoxyglucose (FDG) positron emission tomography (PET)/computed tomography (CT)–derived parameters. Malignant and benign inflammatory lymph nodes have overlapping appearances at FDG PET/CT but can be differentiated by ANNs when the crucial input of node size is used. © RSNA, 2013 Online supplemental material is available for this article. PMID:24056403
The navigation system of the JPL robot
NASA Technical Reports Server (NTRS)
Thompson, A. M.
1977-01-01
The control structure of the JPL research robot and the operations of the navigation subsystem are discussed. The robot functions as a network of interacting concurrent processes distributed among several computers and coordinated by a central executive. The results of scene analysis are used to create a segmented terrain model in which surface regions are classified by traversibility. The model is used by a path planning algorithm, PATH, which uses tree search methods to find the optimal path to a goal. In PATH, the search space is defined dynamically as a consequence of node testing. Maze-solving and the use of an associative data base for context dependent node generation are also discussed. Execution of a planned path is accomplished by a feedback guidance process with automatic error recovery.
An approach for the regularization of a power flow solution around the maximum loading point
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kataoka, Y.
1992-08-01
In the conventional power flow solution, the boundary conditions are directly specified by active power and reactive power at each node, so that the singular point coincided with the maximum loading point. For this reason, the computations are often disturbed by ill-condition. This paper proposes a new method for getting the wide-range regularity by giving some modifications to the conventional power flow solution method, thereby eliminating the singular point or shifting it to the region with the voltage lower than that of the maximum loading point. Then, the continuous execution of V-P curves including maximum loading point is realized. Themore » efficiency and effectiveness of the method are tested in practical 598-nodes system in comparison with the conventional method.« less
Distrubtion Tolerant Network Technology Flight Validation Report: DINET
NASA Technical Reports Server (NTRS)
Jones, Ross M.
2009-01-01
In October and November of 2008, the Jet Propulsion Laboratory installed and tested essential elements of Delay/Disruption Tolerant Networking (DTN) technology on the Deep Impact spacecraft. This experiment, called Deep Impact Network Experiment (DINET), was performed in close cooperation with the EPOXI project which has responsibility for the spacecraft. During DINET some 300 images were transmitted from the JPL nodes to the spacecraft. Then, they were automatically forwarded from the spacecraft back to the JPL nodes, exercising DTN's bundle origination, transmission, acquisition, dynamic route computation, congestion control, prioritization, custody transfer, and automatic retransmission procedures, both on the spacecraft and on the ground, over a period of 27 days. All transmitted bundles were successfully received, without corruption. The DINET experiment demonstrated DTN readiness for operational use in space missions.
Distribution Tolerant Network Technology Flight Validation Report: DINET
NASA Technical Reports Server (NTRS)
Jones, Ross M.
2009-01-01
In October and November of 2008, the Jet Propulsion Laboratory installed and tested essential elements of Delay/Disruption Tolerant Networking (DTN) technology on the Deep Impact spacecraft. This experiment, called Deep Impact Network Experiment (DINET), was performed in close cooperation with the EPOXI project which has responsibility for the spacecraft. During DINET some 300 images were transmitted from the JPL nodes to the spacecraft. Then, they were automatically forwarded from the spacecraft back to the JPL nodes, exercising DTN's bundle origination, transmission, acquisition, dynamic route computation, congestion control, prioritization, custody transfer, and automatic retransmission procedures, both on the spacecraft and on the ground, over a period of 27 days. All transmitted bundles were successfully received, without corruption. The DINET experiment demonstrated DTN readiness for operational use in space missions.
The Earth System Grid Federation (ESGF) Project
NASA Astrophysics Data System (ADS)
Carenton-Madiec, Nicolas; Denvil, Sébastien; Greenslade, Mark
2015-04-01
The Earth System Grid Federation (ESGF) Peer-to-Peer (P2P) enterprise system is a collaboration that develops, deploys and maintains software infrastructure for the management, dissemination, and analysis of model output and observational data. ESGF's primary goal is to facilitate advancements in Earth System Science. It is an interagency and international effort led by the US Department of Energy (DOE), and co-funded by National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA), National Science Foundation (NSF), Infrastructure for the European Network of Earth System Modelling (IS-ENES) and international laboratories such as the Max Planck Institute for Meteorology (MPI-M) german Climate Computing Centre (DKRZ), the Australian National University (ANU) National Computational Infrastructure (NCI), Institut Pierre-Simon Laplace (IPSL), and the British Atmospheric Data Center (BADC). Its main mission is to support current CMIP5 activities and prepare for future assesments. The ESGF architecture is based on a system of autonomous and distributed nodes, which interoperate through common acceptance of federation protocols and trust agreements. Data is stored at multiple nodes around the world, and served through local data and metadata services. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions. The net result is that a user can use a web browser, connect to any node, and seamlessly find and access data throughout the federation. This type of collaborative working organization and distributed architecture context en-lighted the need of integration and testing processes definition to ensure the quality of software releases and interoperability. This presentation will introduce the ESGF project and demonstrate the range of tools and processes that have been set up to support release management activities.
LWT Based Sensor Node Signal Processing in Vehicle Surveillance Distributed Sensor Network
NASA Astrophysics Data System (ADS)
Cha, Daehyun; Hwang, Chansik
Previous vehicle surveillance researches on distributed sensor network focused on overcoming power limitation and communication bandwidth constraints in sensor node. In spite of this constraints, vehicle surveillance sensor node must have signal compression, feature extraction, target localization, noise cancellation and collaborative signal processing with low computation and communication energy dissipation. In this paper, we introduce an algorithm for light-weight wireless sensor node signal processing based on lifting scheme wavelet analysis feature extraction in distributed sensor network.
Self-adaptive trust based ABR protocol for MANETs using Q-learning.
Kumar, Anitha Vijaya; Jeyapal, Akilandeswari
2014-01-01
Mobile ad hoc networks (MANETs) are a collection of mobile nodes with a dynamic topology. MANETs work under scalable conditions for many applications and pose different security challenges. Due to the nomadic nature of nodes, detecting misbehaviour is a complex problem. Nodes also share routing information among the neighbours in order to find the route to the destination. This requires nodes to trust each other. Thus we can state that trust is a key concept in secure routing mechanisms. A number of cryptographic protection techniques based on trust have been proposed. Q-learning is a recently used technique, to achieve adaptive trust in MANETs. In comparison to other machine learning computational intelligence techniques, Q-learning achieves optimal results. Our work focuses on computing a score using Q-learning to weigh the trust of a particular node over associativity based routing (ABR) protocol. Thus secure and stable route is calculated as a weighted average of the trust value of the nodes in the route and associativity ticks ensure the stability of the route. Simulation results show that Q-learning based trust ABR protocol improves packet delivery ratio by 27% and reduces the route selection time by 40% over ABR protocol without trust calculation.
Self-Adaptive Trust Based ABR Protocol for MANETs Using Q-Learning
Jeyapal, Akilandeswari
2014-01-01
Mobile ad hoc networks (MANETs) are a collection of mobile nodes with a dynamic topology. MANETs work under scalable conditions for many applications and pose different security challenges. Due to the nomadic nature of nodes, detecting misbehaviour is a complex problem. Nodes also share routing information among the neighbours in order to find the route to the destination. This requires nodes to trust each other. Thus we can state that trust is a key concept in secure routing mechanisms. A number of cryptographic protection techniques based on trust have been proposed. Q-learning is a recently used technique, to achieve adaptive trust in MANETs. In comparison to other machine learning computational intelligence techniques, Q-learning achieves optimal results. Our work focuses on computing a score using Q-learning to weigh the trust of a particular node over associativity based routing (ABR) protocol. Thus secure and stable route is calculated as a weighted average of the trust value of the nodes in the route and associativity ticks ensure the stability of the route. Simulation results show that Q-learning based trust ABR protocol improves packet delivery ratio by 27% and reduces the route selection time by 40% over ABR protocol without trust calculation. PMID:25254243
Analysis of complex network performance and heuristic node removal strategies
NASA Astrophysics Data System (ADS)
Jahanpour, Ehsan; Chen, Xin
2013-12-01
Removing important nodes from complex networks is a great challenge in fighting against criminal organizations and preventing disease outbreaks. Six network performance metrics, including four new metrics, are applied to quantify networks' diffusion speed, diffusion scale, homogeneity, and diameter. In order to efficiently identify nodes whose removal maximally destroys a network, i.e., minimizes network performance, ten structured heuristic node removal strategies are designed using different node centrality metrics including degree, betweenness, reciprocal closeness, complement-derived closeness, and eigenvector centrality. These strategies are applied to remove nodes from the September 11, 2001 hijackers' network, and their performance are compared to that of a random strategy, which removes randomly selected nodes, and the locally optimal solution (LOS), which removes nodes to minimize network performance at each step. The computational complexity of the 11 strategies and LOS is also analyzed. Results show that the node removal strategies using degree and betweenness centralities are more efficient than other strategies.
NASA Astrophysics Data System (ADS)
Pieńko, Michał; Błazik-Borowa, Ewa
2018-01-01
This paper presents the problem of comparing the results of computer simulations with the results of laboratory tests. The subject of the study was the insert-type joint of scaffolding loaded with a bending moment. The research was carried out on the real elements of the scaffolding. Due to the complexity of the connection different friction coefficients and depths of wedge insertion were taken into account in the analysis. The aim of conducting the series of analyses was to determine the sensitivity of the model to the mentioned characteristics. Since laboratory tests were carried out on the real samples, there were no preparations of surface involved in the load transfer. This approach caused many problems with the clear definition of the nature of work of individual node elements during the load. The analysis consist of two stages: the stage in which the connection is defined (the wedge is inserted into the rosette), and the loading stage (the node is loaded by the bending moment).
Stereological Cell Morphometry In Right Atrium Myocardium Of Primates
NASA Astrophysics Data System (ADS)
Mandarim-De-Lacerda, Carlos A...; Hureau, Jacques
1986-07-01
The mechanism by which the cardiac impulse is propagated in normal hearts from its origin in the sinus node to the atrio-ventricular node has not been agreed on fully. We studied the "internodal posterior tract" through the crista terminalis by light microscopy and stereological morphometry. The hearts of 12 Papio cynocephalus were perfused , after sacrifice,with phosphate-buffered formol saline. The regions of the crista terminalis (CT), interatrial septum (IAS), atrioventricular bundle (AVB) and interventricular septum (IVS) were cut off and embedded in paraplast and sectioned (10 4m). The multipurpose test system M 42 was superimposed over the photomicrographs (1,890 points test, ESR = 2%) to the stereological computing. The quantitative results show that the cells from CT were more closely relationed with IAS cells than others cells (IVS and AVB cells). This results are not a morphological evidence to establish the specificity of the "internodal posterior tract". The cellular arrangement and anatomical variation in CT myocardium is very important.
The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster
NASA Astrophysics Data System (ADS)
Löwe, P.; Klump, J.; Thaler, J.
2012-04-01
Compute clusters can be used as GIS workbenches, their wealth of resources allow us to take on geocomputation tasks which exceed the limitations of smaller systems. To harness these capabilities requires a Geographic Information System (GIS), able to utilize the available cluster configuration/architecture and a sufficient degree of user friendliness to allow for wide application. In this paper we report on the first successful porting of GRASS GIS, the oldest and largest Free Open Source (FOSS) GIS project, onto a compute cluster using Platform Computing's Load Sharing Facility (LSF). In 2008, GRASS6.3 was installed on the GFZ compute cluster, which at that time comprised 32 nodes. The interaction with the GIS was limited to the command line interface, which required further development to encapsulate the GRASS GIS business layer to facilitate its use by users not familiar with GRASS GIS. During the summer of 2011, multiple versions of GRASS GIS (v 6.4, 6.5 and 7.0) were installed on the upgraded GFZ compute cluster, now consisting of 234 nodes with 480 CPUs providing 3084 cores. The GFZ compute cluster currently offers 19 different processing queues with varying hardware capabilities and priorities, allowing for fine-grained scheduling and load balancing. After successful testing of core GIS functionalities, including the graphical user interface, mechanisms were developed to deploy scripted geocomputation tasks onto dedicated processing queues. The mechanisms are based on earlier work by NETELER et al. (2008). A first application of the new GIS functionality was the generation of maps of simulated tsunamis in the Mediterranean Sea for the Tsunami Atlas of the FP-7 TRIDEC Project (www.tridec-online.eu). For this, up to 500 processing nodes were used in parallel. Further trials included the processing of geometrically complex problems, requiring significant amounts of processing time. The GIS cluster successfully completed all these tasks, with processing times lasting up to full 20 CPU days. The deployment of GRASS GIS on a compute cluster allows our users to tackle GIS tasks previously out of reach of single workstations. In addition, this GRASS GIS cluster implementation will be made available to other users at GFZ in the course of 2012. It will thus become a research utility in the sense of "Software as a Service" (SaaS) and can be seen as our first step towards building a GFZ corporate cloud service.
Wireless Sensor Node for Autonomous Monitoring and Alerts in Remote Environments
NASA Technical Reports Server (NTRS)
Panangadan, Anand V. (Inventor); Monacos, Steve P. (Inventor)
2015-01-01
A method, apparatus, system, and computer program products provides personal alert and tracking capabilities using one or more nodes. Each node includes radio transceiver chips operating at different frequency ranges, a power amplifier, sensors, a display, and embedded software. The chips enable the node to operate as either a mobile sensor node or a relay base station node while providing a long distance relay link between nodes. The power amplifier enables a line-of-sight communication between the one or more nodes. The sensors provide a GPS signal, temperature, and accelerometer information (used to trigger an alert condition). The embedded software captures and processes the sensor information, provides a multi-hop packet routing protocol to relay the sensor information to and receive alert information from a command center, and to display the alert information on the display.
Cyber Contingency Analysis version 1.x
DOE Office of Scientific and Technical Information (OSTI.GOV)
Contingency analysis based approach for quantifying and examining the resiliency of a cyber system in respect to confidentiality, integrity and availability. A graph representing an organization's cyber system and related resources is used for the availability contingency analysis. The mission critical paths associated with an organization are used to determine the consequences of a potential contingency. A node (or combination of nodes) are removed from the graph to analyze a particular contingency. The value of all mission critical paths that are disrupted by that contingency are used to quantify its severity. A total severity score can be calculated based onmore » the complete list of all these contingencies. A simple n1 analysis can be done in which only one node is removed at a time for the analysis. We can also compute nk analysis, where k is the number of nodes to simultaneously remove for analysis. A contingency risk score can also be computed, which takes the probability of the contingencies into account. In addition to availability, we can also quantify confidentiality and integrity scores for the system. These treat user accounts as potential contingencies. The amount (and type) of files that an account can read to is used to compute the confidentiality score. The amount (and type) of files that an account can write to is used to compute the integrity score. As with availability analysis, we can use this information to compute total severity scores in regards to confidentiality and integrity. We can also take probability into account to compute associated risk scores.« less
FDG-PET/CT in the evaluation of anal carcinoma
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cotter, Shane E.; Medical Scientist Training Program, Washington University School of Medicine, St. Louis, MO; Grigsby, Perry W.
2006-07-01
Purpose: Surgical staging and treatment of anal carcinoma has been replaced by noninvasive staging studies and combined modality therapy. In this study, we compare computed tomography (CT) and physical examination to [{sup 18}F]-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography (FDG-PET/CT) in the staging of carcinoma of the anal canal, with special emphasis on determination of spread to inguinal lymph nodes. Methods and Materials: Between July 2003 and July 2005, 41 consecutive patients with biopsy-proved anal carcinoma underwent a complete staging evaluation including physical examination, CT, and 2-FDG-PET/CT. Patients ranged in age from 30 to 89 years. Nine men were HIV-positive. Treatment was withmore » standard Nigro regimen. Results: [{sup 18}F]-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography (FDG-PET/CT) detected 91% of nonexcised primary tumors, whereas CT visualized 59%. FDG-PET/CT detected abnormal uptake in pelvic nodes of 5 patients with normal pelvic CT scans. FDG-PET/CT detected abnormal nodes in 20% of groins that were normal by CT, and in 23% without abnormality on physical examination. Furthermore, 17% of groins negative by both CT and physical examination showed abnormal uptake on FDG-PET/CT. HIV-positive patients had an increased frequency of PET-positive lymph nodes. Conclusion: [{sup 18}F]-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography detects the primary tumor more often than CT. FDG-PET/CT detects substantially more abnormal inguinal lymph nodes than are identified by standard clinical staging with CT and physical examination.« less
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoshii, K.; Iskra, K.; Naik, H.
2011-05-01
We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
Understanding Aprun Use Patterns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Hwa-Chun Wendy
2009-05-06
On the Cray XT, aprun is the command to launch an application to a set of compute nodes reserved through the Application Level Placement Scheduler (ALPS). At the National Energy Research Scientific Computing Center (NERSC), interactive aprun is disabled. That is, invocations of aprun have to go through the batch system. Batch scripts can and often do contain several apruns which either use subsets of the reserved nodes in parallel, or use all reserved nodes in consecutive apruns. In order to better understand how NERSC users run on the XT, it is necessary to associate aprun information with jobs. Itmore » is surprisingly more challenging than it sounds. In this paper, we describe those challenges and how we solved them to produce daily per-job reports for completed apruns. We also describe additional uses of the data, e.g. adjusting charging policy accordingly or associating node failures with jobs/users, and plans for enhancements.« less
Kim, Mi Jeong; Maeng, Sung Joon; Cho, Yong Soo
2015-01-01
In this paper, a distributed synchronization technique based on a bio-inspired algorithm is proposed for an orthogonal frequency division multiple access (OFDMA)-based wireless mesh network (WMN) with a time difference of arrival. The proposed time- and frequency-synchronization technique uses only the signals received from the neighbor nodes, by considering the effect of the propagation delay between the nodes. It achieves a fast synchronization with a relatively low computational complexity because it is operated in a distributed manner, not requiring any feedback channel for the compensation of the propagation delays. In addition, a self-organization scheme that can be effectively used to construct 1-hop neighbor nodes is proposed for an OFDMA-based WMN with a large number of nodes. The performance of the proposed technique is evaluated with regard to the convergence property and synchronization success probability using a computer simulation. PMID:26225974
Kim, Mi Jeong; Maeng, Sung Joon; Cho, Yong Soo
2015-07-28
In this paper, a distributed synchronization technique based on a bio-inspired algorithm is proposed for an orthogonal frequency division multiple access (OFDMA)-based wireless mesh network (WMN) with a time difference of arrival. The proposed time- and frequency-synchronization technique uses only the signals received from the neighbor nodes, by considering the effect of the propagation delay between the nodes. It achieves a fast synchronization with a relatively low computational complexity because it is operated in a distributed manner, not requiring any feedback channel for the compensation of the propagation delays. In addition, a self-organization scheme that can be effectively used to construct 1-hop neighbor nodes is proposed for an OFDMA-based WMN with a large number of nodes. The performance of the proposed technique is evaluated with regard to the convergence property and synchronization success probability using a computer simulation.
Multinode reconfigurable pipeline computer
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M. (Inventor); Littman, Michael G. (Inventor)
1989-01-01
A multinode parallel-processing computer is made up of a plurality of innerconnected, large capacity nodes each including a reconfigurable pipeline of functional units such as Integer Arithmetic Logic Processors, Floating Point Arithmetic Processors, Special Purpose Processors, etc. The reconfigurable pipeline of each node is connected to a multiplane memory by a Memory-ALU switch NETwork (MASNET). The reconfigurable pipeline includes three (3) basic substructures formed from functional units which have been found to be sufficient to perform the bulk of all calculations. The MASNET controls the flow of signals from the memory planes to the reconfigurable pipeline and vice versa. the nodes are connectable together by an internode data router (hyperspace router) so as to form a hypercube configuration. The capability of the nodes to conditionally configure the pipeline at each tick of the clock, without requiring a pipeline flush, permits many powerful algorithms to be implemented directly.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
A graph theoretic approach to scene matching
NASA Technical Reports Server (NTRS)
Ranganath, Heggere S.; Chipman, Laure J.
1991-01-01
The ability to match two scenes is a fundamental requirement in a variety of computer vision tasks. A graph theoretic approach to inexact scene matching is presented which is useful in dealing with problems due to imperfect image segmentation. A scene is described by a set of graphs, with nodes representing objects and arcs representing relationships between objects. Each node has a set of values representing the relations between pairs of objects, such as angle, adjacency, or distance. With this method of scene representation, the task in scene matching is to match two sets of graphs. Because of segmentation errors, variations in camera angle, illumination, and other conditions, an exact match between the sets of observed and stored graphs is usually not possible. In the developed approach, the problem is represented as an association graph, in which each node represents a possible mapping of an observed region to a stored object, and each arc represents the compatibility of two mappings. Nodes and arcs have weights indicating the merit or a region-object mapping and the degree of compatibility between two mappings. A match between the two graphs corresponds to a clique, or fully connected subgraph, in the association graph. The task is to find the clique that represents the best match. Fuzzy relaxation is used to update the node weights using the contextual information contained in the arcs and neighboring nodes. This simplifies the evaluation of cliques. A method of handling oversegmentation and undersegmentation problems is also presented. The approach is tested with a set of realistic images which exhibit many types of sementation errors.
Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan
2012-12-11
A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
Integrating Grid Services into the Cray XT4 Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
NERSC; Cholia, Shreyas; Lin, Hwa-Chun Wendy
2009-05-01
The 38640 core Cray XT4"Franklin" system at the National Energy Research Scientific Computing Center (NERSC) is a massively parallel resource available to Department of Energy researchers that also provides on-demand grid computing to the Open Science Grid. The integration of grid services on Franklin presented various challenges, including fundamental differences between the interactive and compute nodes, a stripped down compute-node operating system without dynamic library support, a shared-root environment and idiosyncratic application launching. Inour work, we describe how we resolved these challenges on a running, general-purpose production system to provide on-demand compute, storage, accounting and monitoring services through generic gridmore » interfaces that mask the underlying system-specific details for the end user.« less
Li, Jue; Inada, Shin; Schneider, Jurgen E.; Zhang, Henggui; Dobrzynski, Halina; Boyett, Mark R.
2014-01-01
The aim of the study was to develop a three-dimensional (3D) anatomically-detailed model of the rabbit right atrium containing the sinoatrial and atrioventricular nodes to study the electrophysiology of the nodes. A model was generated based on 3D images of a rabbit heart (atria and part of ventricles), obtained using high-resolution magnetic resonance imaging. Segmentation was carried out semi-manually. A 3D right atrium array model (∼3.16 million elements), including eighteen objects, was constructed. For description of cellular electrophysiology, the Rogers-modified FitzHugh-Nagumo model was further modified to allow control of the major characteristics of the action potential with relatively low computational resource requirements. Model parameters were chosen to simulate the action potentials in the sinoatrial node, atrial muscle, inferior nodal extension and penetrating bundle. The block zone was simulated as passive tissue. The sinoatrial node, crista terminalis, main branch and roof bundle were considered as anisotropic. We have simulated normal and abnormal electrophysiology of the two nodes. In accordance with experimental findings: (i) during sinus rhythm, conduction occurs down the interatrial septum and into the atrioventricular node via the fast pathway (conduction down the crista terminalis and into the atrioventricular node via the slow pathway is slower); (ii) during atrial fibrillation, the sinoatrial node is protected from overdrive by its long refractory period; and (iii) during atrial fibrillation, the atrioventricular node reduces the frequency of action potentials reaching the ventricles. The model is able to simulate ventricular echo beats. In summary, a 3D anatomical model of the right atrium containing the cardiac conduction system is able to simulate a wide range of classical nodal behaviours. PMID:25380074
Active Nodal Task Seeking for High-Performance, Ultra-Dependable Computing
1994-07-01
implementation. Figure 1 shows a hardware organization of ANTS: stand-alone computing nodes inter - connected by buses. 2.1 Run Time Partitioning The...nodes in 14 respond to changing loads [27] or system reconfiguration [26]. Existing techniques are all source-initiated or server-initiated [27]. 5.1...short-running task segments. The task segments must be short-running in order that processors will become avalable often enough to satisfy changing
Storing files in a parallel computing system based on user or application specification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faibish, Sorin; Bent, John M.; Nick, Jeffrey M.
2016-03-29
Techniques are provided for storing files in a parallel computing system based on a user-specification. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a specification from the distributed application indicating how the plurality of files should be stored; and storing one or more of the plurality of files in one or more storage nodes of a multi-tier storage system based on the specification. The plurality of files comprise a plurality of complete files and/or a plurality of sub-files. The specification can optionally be processed by a daemon executing on onemore » or more nodes in a multi-tier storage system. The specification indicates how the plurality of files should be stored, for example, identifying one or more storage nodes where the plurality of files should be stored.« less
Heavy Analysis and Light Virtualization of Water Use Data with Python
NASA Astrophysics Data System (ADS)
Kim, H.; Bijoor, N.; Famiglietti, J. S.
2014-12-01
Water utilities possess a large amount of water data that could be used to inform urban ecohydrology, management decisions, and conservation policies, but such data are rarely analyzed owing to difficulty in analyzation, visualization, and interpretion. We have developed a high performance computing resource for this purpose. We partnered with 6 water agencies in Orange County who provided 10 years of parcel-level monthly water use billing data for a pilot study. The first challenge that we overcame was to refine all human errors and unify the many different formats of data over all agencies. Second, we tested and applied experimental approaches to the data, including complex calculations, with high efficiency. Third, we developed a method to refine the data so it can be browsed along a time series index and/or geo-spatial queries with high efficiency, no matter how large the data. Python scientific libraries were the best match to handle arbitrary data sets in our environment. Further milestones include agency entry, sets of formulae, and maintaining 15M rows X 70 columns of data with high performance of cpu-bound processes. To deal with billions of rows, we performed an analysis virtualization stack by leveraging iPython parallel computing. With this architecture, one agency could be considered one computing node or virtual machine that maintains its own data sets respectively. For example, a big agency could use a large node, and a small agency could use a micro node. Under the minimum required raw data specs, more agencies could be analyzed. The program developed in this study simplifies data analysis, visualization, and interpretation of large water datasets, and can be used to analyze large data volumes from water agencies nationally or worldwide.
Deterministic quantum state transfer and remote entanglement using microwave photons.
Kurpiers, P; Magnard, P; Walter, T; Royer, B; Pechal, M; Heinsoo, J; Salathé, Y; Akin, A; Storz, S; Besse, J-C; Gasparinetti, S; Blais, A; Wallraff, A
2018-06-01
Sharing information coherently between nodes of a quantum network is fundamental to distributed quantum information processing. In this scheme, the computation is divided into subroutines and performed on several smaller quantum registers that are connected by classical and quantum channels 1 . A direct quantum channel, which connects nodes deterministically rather than probabilistically, achieves larger entanglement rates between nodes and is advantageous for distributed fault-tolerant quantum computation 2 . Here we implement deterministic state-transfer and entanglement protocols between two superconducting qubits fabricated on separate chips. Superconducting circuits 3 constitute a universal quantum node 4 that is capable of sending, receiving, storing and processing quantum information 5-8 . Our implementation is based on an all-microwave cavity-assisted Raman process 9 , which entangles or transfers the qubit state of a transmon-type artificial atom 10 with a time-symmetric itinerant single photon. We transfer qubit states by absorbing these itinerant photons at the receiving node, with a probability of 98.1 ± 0.1 per cent, achieving a transfer-process fidelity of 80.02 ± 0.07 per cent for a protocol duration of only 180 nanoseconds. We also prepare remote entanglement on demand with a fidelity as high as 78.9 ± 0.1 per cent at a rate of 50 kilohertz. Our results are in excellent agreement with numerical simulations based on a master-equation description of the system. This deterministic protocol has the potential to be used for quantum computing distributed across different nodes of a cryogenic network.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Z; Gao, M
Purpose: Monte Carlo simulation plays an important role for proton Pencil Beam Scanning (PBS) technique. However, MC simulation demands high computing power and is limited to few large proton centers that can afford a computer cluster. We study the feasibility of utilizing cloud computing in the MC simulation of PBS beams. Methods: A GATE/GEANT4 based MC simulation software was installed on a commercial cloud computing virtual machine (Linux 64-bits, Amazon EC2). Single spot Integral Depth Dose (IDD) curves and in-air transverse profiles were used to tune the source parameters to simulate an IBA machine. With the use of StarCluster softwaremore » developed at MIT, a Linux cluster with 2–100 nodes can be conveniently launched in the cloud. A proton PBS plan was then exported to the cloud where the MC simulation was run. Results: The simulated PBS plan has a field size of 10×10cm{sup 2}, 20cm range, 10cm modulation, and contains over 10,000 beam spots. EC2 instance type m1.medium was selected considering the CPU/memory requirement and 40 instances were used to form a Linux cluster. To minimize cost, master node was created with on-demand instance and worker nodes were created with spot-instance. The hourly cost for the 40-node cluster was $0.63 and the projected cost for a 100-node cluster was $1.41. Ten million events were simulated to plot PDD and profile, with each job containing 500k events. The simulation completed within 1 hour and an overall statistical uncertainty of < 2% was achieved. Good agreement between MC simulation and measurement was observed. Conclusion: Cloud computing is a cost-effective and easy to maintain platform to run proton PBS MC simulation. When proton MC packages such as GATE and TOPAS are combined with cloud computing, it will greatly facilitate the pursuing of PBS MC studies, especially for newly established proton centers or individual researchers.« less
NASA Technical Reports Server (NTRS)
Burleigh, Scott C.
2011-01-01
Contact Graph Routing (CGR) is a dynamic routing system that computes routes through a time-varying topology of scheduled communication contacts in a network based on the DTN (Delay-Tolerant Networking) architecture. It is designed to enable dynamic selection of data transmission routes in a space network based on DTN. This dynamic responsiveness in route computation should be significantly more effective and less expensive than static routing, increasing total data return while at the same time reducing mission operations cost and risk. The basic strategy of CGR is to take advantage of the fact that, since flight mission communication operations are planned in detail, the communication routes between any pair of bundle agents in a population of nodes that have all been informed of one another's plans can be inferred from those plans rather than discovered via dialogue (which is impractical over long one-way-light-time space links). Messages that convey this planning information are used to construct contact graphs (time-varying models of network connectivity) from which CGR automatically computes efficient routes for bundles. Automatic route selection increases the flexibility and resilience of the space network, simplifying cross-support and reducing mission management costs. Note that there are no routing tables in Contact Graph Routing. The best route for a bundle destined for a given node may routinely be different from the best route for a different bundle destined for the same node, depending on bundle priority, bundle expiration time, and changes in the current lengths of transmission queues for neighboring nodes; routes must be computed individually for each bundle, from the Bundle Protocol agent's current network connectivity model for the bundle s destination node (the contact graph). Clearly this places a premium on optimizing the implementation of the route computation algorithm. The scalability of CGR to very large networks remains a research topic. The information carried by CGR contact plan messages is useful not only for dynamic route computation, but also for the implementation of rate control, congestion forecasting, transmission episode initiation and termination, timeout interval computation, and retransmission timer suspension and resumption.
Picoradio: Communication/Computation Piconodes for Sensor Networks
2003-01-02
diagram of PicoNode III, or Quark node. It is made from two custom chips, Strange RF and Charm digital processor , and is complemented by a set of...the chipset comprising of Strange (analog OOK transceiver) and Charm (digital processor ) chips. 44 Figure 33: System block diagram of the Quark node...19 2.B PICONODE II - TWO-CHIP PICONODE IMPLEMENTATION ......................................... 21 2.B.1 Baseband processor (BBP
Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Time Series Analysis for Spatial Node Selection in Environment Monitoring Sensor Networks
Bhandari, Siddhartha; Jurdak, Raja; Kusy, Branislav
2017-01-01
Wireless sensor networks are widely used in environmental monitoring. The number of sensor nodes to be deployed will vary depending on the desired spatio-temporal resolution. Selecting an optimal number, position and sampling rate for an array of sensor nodes in environmental monitoring is a challenging question. Most of the current solutions are either theoretical or simulation-based where the problems are tackled using random field theory, computational geometry or computer simulations, limiting their specificity to a given sensor deployment. Using an empirical dataset from a mine rehabilitation monitoring sensor network, this work proposes a data-driven approach where co-integrated time series analysis is used to select the number of sensors from a short-term deployment of a larger set of potential node positions. Analyses conducted on temperature time series show 75% of sensors are co-integrated. Using only 25% of the original nodes can generate a complete dataset within a 0.5 °C average error bound. Our data-driven approach to sensor position selection is applicable for spatiotemporal monitoring of spatially correlated environmental parameters to minimize deployment cost without compromising data resolution. PMID:29271880
Announcing Supercomputer Summit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wells, Jack; Bland, Buddy; Nichols, Jeff
Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe. Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte ofmore » coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the world’s most pressing challenges.« less
NASA Astrophysics Data System (ADS)
Sapra, Karan; Gupta, Saurabh; Atchley, Scott; Anantharaj, Valentine; Miller, Ross; Vazhkudai, Sudharshan
2016-04-01
Efficient resource utilization is critical for improved end-to-end computing and workflow of scientific applications. Heterogeneous node architectures, such as the GPU-enabled Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), present us with further challenges. In many HPC applications on Titan, the accelerators are the primary compute engines while the CPUs orchestrate the offloading of work onto the accelerators, and moving the output back to the main memory. On the other hand, applications that do not exploit GPUs, the CPU usage is dominant while the GPUs idle. We utilized Heterogenous Functional Partitioning (HFP) runtime framework that can optimize usage of resources on a compute node to expedite an application's end-to-end workflow. This approach is different from existing techniques for in-situ analyses in that it provides a framework for on-the-fly analysis on-node by dynamically exploiting under-utilized resources therein. We have implemented in the Community Earth System Model (CESM) a new concurrent diagnostic processing capability enabled by the HFP framework. Various single variate statistics, such as means and distributions, are computed in-situ by launching HFP tasks on the GPU via the node local HFP daemon. Since our current configuration of CESM does not use GPU resources heavily, we can move these tasks to GPU using the HFP framework. Each rank running the atmospheric model in CESM pushes the variables of of interest via HFP function calls to the HFP daemon. This node local daemon is responsible for receiving the data from main program and launching the designated analytics tasks on the GPU. We have implemented these analytics tasks in C and use OpenACC directives to enable GPU acceleration. This methodology is also advantageous while executing GPU-enabled configurations of CESM when the CPUs will be idle during portions of the runtime. In our implementation results, we demonstrate that it is more efficient to use HFP framework to offload the tasks to GPUs instead of doing it in the main application. We observe increased resource utilization and overall productivity in this approach by using HFP framework for end-to-end workflow.
Cell boundary fault detection system
Archer, Charles Jens [Rochester, MN; Pinnow, Kurt Walter [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian Edward [Rochester, MN
2011-04-19
An apparatus and program product determine a nodal fault along the boundary, or face, of a computing cell. Nodes on adjacent cell boundaries communicate with each other, and the communications are analyzed to determine if a node or connection is faulty.
Expedition 40 crew in Node 2 after German - U.S. soccer game
2014-06-26
European Space Agency astronaut Alexander Gerst, Expedition 40 flight engineer, and NASA astronaut Steve Swanson, commander, gather around a computer in the Unity node of the International Space Station after the German-USA soccer match.
Algorithm implementation on the Navier-Stokes computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krist, S.E.; Zang, T.A.
1987-03-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Algorithm implementation on the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Krist, Steven E.; Zang, Thomas A.
1987-01-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Advanced flight computer. Special study
NASA Technical Reports Server (NTRS)
Coo, Dennis
1995-01-01
This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.
Ince, Volkan; Isik, Burak; Ozdemir, Fatih; Ozgor, Dincer; Ara, Cengiz; Yilmaz, Sezai
2018-04-09
Fibrolamellar hepatocellular carcinoma is a rare primary malignant liver neoplasm. Benefits from liver transplant for patients with fibrolamellar hepatocellular carcinoma have not yet been reported. Here, we report a 19-year-old female patient who presented with abdominal pain. A computed tomography scan revealed bilobar and multiple solid lesions with the largest measuring 15 cm in diameter on the right lobe of her liver. Her blood alpha-fetoprotein level and viral hepatitis markers were normal. A fine-needle biopsy of the largest lesion detected fibrolamellar heptocellular carcinoma. Because no distant metastasis was evident and the carcinoma was unresectable, a right lobe living-donor liver transplant with hilar lymph node dissection was performed. A pathology report revealed poorly differentiated fibrolamellar hepatocellular carcinoma, and further testing indicated microvascular invasion and hilar lymph node metastasis. The largest tumor measured 12 cm. She was discharged on postoperative day 14. During postoperative month 22, multiple vertebral metastases were detected, and she died with diffuse metastasis during postoperative month 26. Our patient, with poor prognostic criteria such as hilar lymph node metastasis, microvascular invasion, and poor differentiation, had 22 months of tumor-free survival and 26 months of overall survival after having undergone living-donor liver transplant.
Definition and automatic anatomy recognition of lymph node zones in the pelvis on CT images
NASA Astrophysics Data System (ADS)
Liu, Yu; Udupa, Jayaram K.; Odhner, Dewey; Tong, Yubing; Guo, Shuxu; Attor, Rosemary; Reinicke, Danica; Torigian, Drew A.
2016-03-01
Currently, unlike IALSC-defined thoracic lymph node zones, no explicitly provided definitions for lymph nodes in other body regions are available. Yet, definitions are critical for standardizing the recognition, delineation, quantification, and reporting of lymphadenopathy in other body regions. Continuing from our previous work in the thorax, this paper proposes a standardized definition of the grouping of pelvic lymph nodes into 10 zones. We subsequently employ our earlier Automatic Anatomy Recognition (AAR) framework designed for body-wide organ modeling, recognition, and delineation to actually implement these zonal definitions where the zones are treated as anatomic objects. First, all 10 zones and key anatomic organs used as anchors are manually delineated under expert supervision for constructing fuzzy anatomy models of the assembly of organs together with the zones. Then, optimal hierarchical arrangement of these objects is constructed for the purpose of achieving the best zonal recognition. For actual localization of the objects, two strategies are used -- optimal thresholded search for organs and one-shot method for the zones where the known relationship of the zones to key organs is exploited. Based on 50 computed tomography (CT) image data sets for the pelvic body region and an equal division into training and test subsets, automatic zonal localization within 1-3 voxels is achieved.
Itazawa, Tomoko; Tamaki, Yukihisa; Komiyama, Takafumi; Nishimura, Yasumasa; Nakayama, Yuko; Ito, Hiroyuki; Ohde, Yasuhisa; Kusumoto, Masahiko; Sakai, Shuji; Suzuki, Kenji; Watanabe, Hirokazu; Asamura, Hisao
2017-01-01
The purpose of this study was to develop a consensus-based computed tomographic (CT) atlas that defines lymph node stations in radiotherapy for lung cancer based on the lymph node map of the International Association for the Study of Lung Cancer (IASLC). A project group in the Japanese Radiation Oncology Study Group (JROSG) initially prepared a draft of the atlas in which lymph node Stations 1–11 were illustrated on axial CT images. Subsequently, a joint committee of the Japan Lung Cancer Society (JLCS) and the Japanese Society for Radiation Oncology (JASTRO) was formulated to revise this draft. The committee consisted of four radiation oncologists, four thoracic surgeons and three thoracic radiologists. The draft prepared by the JROSG project group was intensively reviewed and discussed at four meetings of the committee over several months. Finally, we proposed definitions for the regional lymph node stations and the consensus-based CT atlas. This atlas was approved by the Board of Directors of JLCS and JASTRO. This resulted in the first official CT atlas for defining regional lymph node stations in radiotherapy for lung cancer authorized by the JLCS and JASTRO. In conclusion, the JLCS–JASTRO consensus-based CT atlas, which conforms to the IASLC lymph node map, was established. PMID:27609192
NASA Astrophysics Data System (ADS)
Wagner, R.; Norman, M. L.
Here we present a working example of a Basic SkyNode serving theoretical data. The data is taken from the Simulated Cluster Archive (SCA), a set of simulated X-ray clusters, where each cluster was computed using four different physics models. The LCA Theory SkyNode (LCATheory) tables contain columns of the integrated physical properties of the clusters at various redshifts. The ease of setting up a Theory SkyNode is an important result, because it represents a clear way to present theory data to the Virtual Observatory. Also, our Theory SkyNode provides a prototype for additional simulated object catalogs, which will be created from other simulations by our group, and hopefully others.
Napolitano, L.M. Jr.
1995-11-28
The Lambda network is a single stage, packet-switched interprocessor communication network for a distributed memory, parallel processor computer. Its design arises from the desired network characteristics of minimizing mean and maximum packet transfer time, local routing, expandability, deadlock avoidance, and fault tolerance. The network is based on fixed degree nodes and has mean and maximum packet transfer distances where n is the number of processors. The routing method is detailed, as are methods for expandability, deadlock avoidance, and fault tolerance. 14 figs.
Calculus domains modelled using an original bool algebra based on polygons
NASA Astrophysics Data System (ADS)
Oanta, E.; Panait, C.; Raicu, A.; Barhalescu, M.; Axinte, T.
2016-08-01
Analytical and numerical computer based models require analytical definitions of the calculus domains. The paper presents a method to model a calculus domain based on a bool algebra which uses solid and hollow polygons. The general calculus relations of the geometrical characteristics that are widely used in mechanical engineering are tested using several shapes of the calculus domain in order to draw conclusions regarding the most effective methods to discretize the domain. The paper also tests the results of several CAD commercial software applications which are able to compute the geometrical characteristics, being drawn interesting conclusions. The tests were also targeting the accuracy of the results vs. the number of nodes on the curved boundary of the cross section. The study required the development of an original software consisting of more than 1700 computer code lines. In comparison with other calculus methods, the discretization using convex polygons is a simpler approach. Moreover, this method doesn't lead to large numbers as the spline approximation did, in that case being required special software packages in order to offer multiple, arbitrary precision. The knowledge resulted from this study may be used to develop complex computer based models in engineering.
Testnodes: a Lightweight Node-Testing Infrastructure
NASA Astrophysics Data System (ADS)
Fay, R.; Bland, J.
2014-06-01
A key aspect of ensuring optimum cluster reliability and productivity lies in keeping worker nodes in a healthy state. Testnodes is a lightweight node testing solution developed at Liverpool. While Nagios has been used locally for general monitoring of hosts and services, Testnodes is optimised to answer one question: is there any reason this node should not be accepting jobs? This tight focus enables Testnodes to inspect nodes frequently with minimal impact and provide a comprehensive and easily extended check with each inspection. On the server side, Testnodes, implemented in python, interoperates with the Torque batch server to control the nodes production status. Testnodes remotely and in parallel executes client-side test scripts and processes the return codes and output, adjusting the node's online/offline status accordingly to preserve the integrity of the overall batch system. Testnodes reports via log, email and Nagios, allowing a quick overview of node status to be reviewed and specific node issues to be identified and resolved quickly. This presentation will cover testnodes design and implementation, together with the results of its use in production at Liverpool, and future development plans.
NESTOR: A Computer-Based Medical Diagnostic Aid That Integrates Causal and Probabilistic Knowledge.
1984-11-01
indiidual conditional probabilities between one cause node and its effect node, but less common to know a joint conditional probability between a...PERFOAMING ORG. REPORT NUMBER * 7. AUTI4ORs) O Gregory F. Cooper 1 CONTRACT OR GRANT NUMBERIa) ONR N00014-81-K-0004 g PERFORMING ORGANIZATION NAME AND...ADDRESS 10. PROGRAM ELEMENT, PROJECT. TASK Department of Computer Science AREA & WORK UNIT NUMBERS Stanford University Stanford, CA 94305 USA 12. REPORT
Peregrine Job Queues and Scheduling Policies | High-Performance Computing |
batch batch-h long bigmem data-transfer feature Max wall time 1 hour 4 hours 2 days 2 days 10 days 10 # nodes per job 2 8 288 576 120 46 1 # of 24 core 64 GB Haswell nodes 2 8 0 1228 0 0 0 haswell # of 24core 32 GB nodes 2 16 576 0 126 0 0 24core # of 16core 32 GB nodes 2 8 195 0 162 0 5 16core, # of 24core
NASA Astrophysics Data System (ADS)
Pescaru, A.; Oanta, E.; Axinte, T.; Dascalescu, A.-D.
2015-11-01
Computer aided engineering is based on models of the phenomena which are expressed as algorithms. The implementations of the algorithms are usually software applications which are processing a large volume of numerical data, regardless the size of the input data. In this way, the finite element method applications used to have an input data generator which was creating the entire volume of geometrical data, starting from the initial geometrical information and the parameters stored in the input data file. Moreover, there were several data processing stages, such as: renumbering of the nodes meant to minimize the size of the band length of the system of equations to be solved, computation of the equivalent nodal forces, computation of the element stiffness matrix, assemblation of system of equations, solving the system of equations, computation of the secondary variables. The modern software application use pre-processing and post-processing programs to easily handle the information. Beside this example, CAE applications use various stages of complex computation, being very interesting the accuracy of the final results. Along time, the development of CAE applications was a constant concern of the authors and the accuracy of the results was a very important target. The paper presents the various computing techniques which were imagined and implemented in the resulting applications: finite element method programs, finite difference element method programs, applied general numerical methods applications, data generators, graphical applications, experimental data reduction programs. In this context, the use of the extended precision data types was one of the solutions, the limitations being imposed by the size of the memory which may be allocated. To avoid the memory-related problems the data was stored in files. To minimize the execution time, part of the file was accessed using the dynamic memory allocation facilities. One of the most important consequences of the paper is the design of a library which includes the optimized solutions previously tested, that may be used for the easily development of original CAE cross-platform applications. Last but not least, beside the generality of the data type solutions, there is targeted the development of a software library which may be used for the easily development of node-based CAE applications, each node having several known or unknown parameters, the system of equations being automatically generated and solved.
Transient Solid Dynamics Simulations on the Sandia/Intel Teraflop Computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attaway, S.; Brown, K.; Gardner, D.
1997-12-31
Transient solid dynamics simulations are among the most widely used engineering calculations. Industrial applications include vehicle crashworthiness studies, metal forging, and powder compaction prior to sintering. These calculations are also critical to defense applications including safety studies and weapons simulations. The practical importance of these calculations and their computational intensiveness make them natural candidates for parallelization. This has proved to be difficult, and existing implementations fail to scale to more than a few dozen processors. In this paper we describe our parallelization of PRONTO, Sandia`s transient solid dynamics code, via a novel algorithmic approach that utilizes multiple decompositions for differentmore » key segments of the computations, including the material contact calculation. This latter calculation is notoriously difficult to perform well in parallel, because it involves dynamically changing geometry, global searches for elements in contact, and unstructured communications among the compute nodes. Our approach scales to at least 3600 compute nodes of the Sandia/Intel Teraflop computer (the largest set of nodes to which we have had access to date) on problems involving millions of finite elements. On this machine we can simulate models using more than ten- million elements in a few tenths of a second per timestep, and solve problems more than 3000 times faster than a single processor Cray Jedi.« less
NASA Astrophysics Data System (ADS)
Luo, Ye; Esler, Kenneth; Kent, Paul; Shulenburger, Luke
Quantum Monte Carlo (QMC) calculations of giant molecules, surface and defect properties of solids have been feasible recently due to drastically expanding computational resources. However, with the most computationally efficient basis set, B-splines, these calculations are severely restricted by the memory capacity of compute nodes. The B-spline coefficients are shared on a node but not distributed among nodes, to ensure fast evaluation. A hybrid representation which incorporates atomic orbitals near the ions and B-spline ones in the interstitial regions offers a more accurate and less memory demanding description of the orbitals because they are naturally more atomic like near ions and much smoother in between, thus allowing coarser B-spline grids. We will demonstrate the advantage of hybrid representation over pure B-spline and Gaussian basis sets and also show significant speed-up like computing the non-local pseudopotentials with our new scheme. Moreover, we discuss a new algorithm for atomic orbital initialization which used to require an extra workflow step taking a few days. With this work, the highly efficient hybrid representation paves the way to simulate large size even in-homogeneous systems using QMC. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Computational Materials Sciences Program.
Akthar, Adil S; Ferguson, Mark K; Koshy, Matthew; Vigneswaran, Wickii T; Malik, Renuka
2017-02-01
Patients receiving stereotactic body radiotherapy for stage I non-small cell lung cancer are typically staged clinically with positron emission tomography-computed tomography. Currently, limited data exist for the detection of occult hilar/peribronchial (N1) disease. We hypothesize that positron emission tomography-computed tomography underestimates spread of cancer to N1 lymph nodes and that future stereotactic body radiotherapy patients may benefit from increased pathologic evaluation of N1 nodal stations in addition to N2 nodes. A retrospective study was performed of all patients with clinical stage I (T1-2aN0) non-small cell lung cancer (American Joint Committee on Cancer, 7th edition) by positron emission tomography-computed tomography at our institution from 2003 to 2011, with subsequent surgical resection and lymph node staging. Findings on positron emission tomography-computed tomography were compared to pathologic nodal involvement to determine the negative predictive value of positron emission tomography-computed tomography for the detection of N1 nodal disease. An analysis was conducted to identify predictors of occult spread. A total of 105 patients with clinical stage I non-small cell lung cancer were included in this study, of which 8 (7.6%) patients were found to have occult N1 metastasis on pathologic review yielding a negative predictive value for N1 disease of 92.4%. No patients had occult mediastinal nodes. The negative predictive value for positron emission tomography-computed tomography in patients with clinical stage T1 versus T2 tumors was 72 (96%) of 75 versus 25 (83%) of 30, respectively ( P = .03), and for peripheral versus central tumor location was 77 (98%) of 78 versus 20 (74%) of 27, respectively ( P = .0001). The negative predictive values for peripheral T1 and T2 tumors were 98% and 100%, respectively; while for central T1 and T2 tumors, the rates were 85% and 64%, respectively. Occult lymph node involvement was not associated with primary tumor maximum standard uptake value, histology, grade, or interval between positron emission tomography-computed tomography and surgery. Our results support pathologic assessment of N1 lymph nodes in patients with stage Inon-small cell lung cancer considered for stereotactic body radiotherapy, with the greatest benefit in patients with central and T2 tumors. Diagnostic evaluation with endoscopic bronchial ultrasound should be considered in the evaluation of stereotactic body radiotherapy candidates.
Akthar, Adil S.; Ferguson, Mark K.; Koshy, Matthew; Vigneswaran, Wickii T.
2016-01-01
Purpose/Objectives: Patients receiving stereotactic body radiotherapy for stage I non-small cell lung cancer are typically staged clinically with positron emission tomography–computed tomography. Currently, limited data exist for the detection of occult hilar/peribronchial (N1) disease. We hypothesize that positron emission tomography–computed tomography underestimates spread of cancer to N1 lymph nodes and that future stereotactic body radiotherapy patients may benefit from increased pathologic evaluation of N1 nodal stations in addition to N2 nodes. Materials/Methods: A retrospective study was performed of all patients with clinical stage I (T1-2aN0) non-small cell lung cancer (American Joint Committee on Cancer, 7th edition) by positron emission tomography–computed tomography at our institution from 2003 to 2011, with subsequent surgical resection and lymph node staging. Findings on positron emission tomography–computed tomography were compared to pathologic nodal involvement to determine the negative predictive value of positron emission tomography–computed tomography for the detection of N1 nodal disease. An analysis was conducted to identify predictors of occult spread. Results: A total of 105 patients with clinical stage I non-small cell lung cancer were included in this study, of which 8 (7.6%) patients were found to have occult N1 metastasis on pathologic review yielding a negative predictive value for N1 disease of 92.4%. No patients had occult mediastinal nodes. The negative predictive value for positron emission tomography–computed tomography in patients with clinical stage T1 versus T2 tumors was 72 (96%) of 75 versus 25 (83%) of 30, respectively (P = .03), and for peripheral versus central tumor location was 77 (98%) of 78 versus 20 (74%) of 27, respectively (P = .0001). The negative predictive values for peripheral T1 and T2 tumors were 98% and 100%, respectively; while for central T1 and T2 tumors, the rates were 85% and 64%, respectively. Occult lymph node involvement was not associated with primary tumor maximum standard uptake value, histology, grade, or interval between positron emission tomography–computed tomography and surgery. Conclusion: Our results support pathologic assessment of N1 lymph nodes in patients with stage Inon-small cell lung cancer considered for stereotactic body radiotherapy, with the greatest benefit in patients with central and T2 tumors. Diagnostic evaluation with endoscopic bronchial ultrasound should be considered in the evaluation of stereotactic body radiotherapy candidates. PMID:26792491
Architectures for Cognitive Systems
2010-02-01
highly modular many- node chip was designed which addressed power efficiency to the maximum extent possible. Each node contains an Asynchronous Field...optimization to perform complex cognitive computing operations. This project focused on the design of the core and integration across a four node chip . A...follow on project will focus on creating a 3 dimensional stack of chips that is enabled by the low power usage. The chip incorporates structures to
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shadid, John Nicolas; Lin, Paul Tinphone
2009-01-01
This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a capacity cluster with 272 compute nodes based on a homogeneous multicore node architecture utilizing 16 cores. The inter-node communication backbone for this Tri-Lab Linux Capacity Cluster (TLCC) machine is comprised of an InfiniBand interconnect. The nonuniform memory access (NUMA) nodes consist of 2.2 GHz quad socket/quad core AMD Opteron processors. The performance results for this study are obtained with a FE semiconductor device simulation code (Charon) that is based on a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling andmore » multicore performance results are presented for large-scale problems of 100+ million unknowns on up to 4096 cores. A parallel scaling comparison is also presented with the Cray XT3/4 Red Storm capability platform. The results indicate that an MPI-only programming model for utilizing the multicore nodes is reasonably efficient on all 16 cores per compute node. However, the results also indicated that the multilevel preconditioner, which is critical for large-scale capability type simulations, scales better on the Red Storm machine than the TLCC machine.« less
Burnside, Elizabeth S.; Drukker, Karen; Li, Hui; Bonaccio, Ermelinda; Zuley, Margarita; Ganott, Marie; Net, Jose M.; Sutton, Elizabeth; Brandt, Kathleen R.; Whitman, Gary; Conzen, Suzanne; Lan, Li; Ji, Yuan; Zhu, Yitan; Jaffe, Carl; Huang, Erich; Freymann, John; Kirby, Justin; Morris, Elizabeth; Giger, Maryellen
2015-01-01
Background To demonstrate that computer-extracted image phenotypes (CEIPs) of biopsy-proven breast cancer on MRI can accurately predict pathologic stage. Methods We used a dataset of de-identified breast MRIs organized by the National Cancer Institute in The Cancer Imaging Archive. We analyzed 91 biopsy-proven breast cancer cases with pathologic stage (stage I = 22; stage II = 58; stage III = 11) and surgically proven nodal status (negative nodes = 46, ≥ 1 positive node = 44, no nodes examined = 1). We characterized tumors by (a) radiologist measured size, and (b) CEIP. We built models combining two CEIPs to predict tumor pathologic stage and lymph node involvement, evaluated them in leave-one-out cross-validation with area under the ROC curve (AUC) as figure of merit. Results Tumor size was the most powerful predictor of pathologic stage but CEIPs capturing biologic behavior also emerged as predictive (e.g. stage I+II vs. III demonstrated AUC = 0.83). No size measure was successful in the prediction of positive lymph nodes but adding a CEIP describing tumor “homogeneity,” significantly improved this discrimination (AUC = 0.62, p=.003) over chance. Conclusions Our results indicate that MRI phenotypes show promise for predicting breast cancer pathologic stage and lymph node status. PMID:26619259
Enhanced Contact Graph Routing (ECGR) MACHETE Simulation Model
NASA Technical Reports Server (NTRS)
Segui, John S.; Jennings, Esther H.; Clare, Loren P.
2013-01-01
Contact Graph Routing (CGR) for Delay/Disruption Tolerant Networking (DTN) space-based networks makes use of the predictable nature of node contacts to make real-time routing decisions given unpredictable traffic patterns. The contact graph will have been disseminated to all nodes before the start of route computation. CGR was designed for space-based networking environments where future contact plans are known or are independently computable (e.g., using known orbital dynamics). For each data item (known as a bundle in DTN), a node independently performs route selection by examining possible paths to the destination. Route computation could conceivably run thousands of times a second, so computational load is important. This work refers to the simulation software model of Enhanced Contact Graph Routing (ECGR) for DTN Bundle Protocol in JPL's MACHETE simulation tool. The simulation model was used for performance analysis of CGR and led to several performance enhancements. The simulation model was used to demonstrate the improvements of ECGR over CGR as well as other routing methods in space network scenarios. ECGR moved to using earliest arrival time because it is a global monotonically increasing metric that guarantees the safety properties needed for the solution's correctness since route re-computation occurs at each node to accommodate unpredicted changes (e.g., traffic pattern, link quality). Furthermore, using earliest arrival time enabled the use of the standard Dijkstra algorithm for path selection. The Dijkstra algorithm for path selection has a well-known inexpensive computational cost. These enhancements have been integrated into the open source CGR implementation. The ECGR model is also useful for route metric experimentation and comparisons with other DTN routing protocols particularly when combined with MACHETE's space networking models and Delay Tolerant Link State Routing (DTLSR) model.
Bent, John M.; Faibish, Sorin; Grider, Gary
2016-04-19
Cloud object storage is enabled for checkpoints of high performance computing applications using a middleware process. A plurality of files, such as checkpoint files, generated by a plurality of processes in a parallel computing system are stored by obtaining said plurality of files from said parallel computing system; converting said plurality of files to objects using a log structured file system middleware process; and providing said objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
A Family of Algorithms for Computing Consensus about Node State from Network Data
Brush, Eleanor R.; Krakauer, David C.; Flack, Jessica C.
2013-01-01
Biological and social networks are composed of heterogeneous nodes that contribute differentially to network structure and function. A number of algorithms have been developed to measure this variation. These algorithms have proven useful for applications that require assigning scores to individual nodes–from ranking websites to determining critical species in ecosystems–yet the mechanistic basis for why they produce good rankings remains poorly understood. We show that a unifying property of these algorithms is that they quantify consensus in the network about a node's state or capacity to perform a function. The algorithms capture consensus by either taking into account the number of a target node's direct connections, and, when the edges are weighted, the uniformity of its weighted in-degree distribution (breadth), or by measuring net flow into a target node (depth). Using data from communication, social, and biological networks we find that that how an algorithm measures consensus–through breadth or depth– impacts its ability to correctly score nodes. We also observe variation in sensitivity to source biases in interaction/adjacency matrices: errors arising from systematic error at the node level or direct manipulation of network connectivity by nodes. Our results indicate that the breadth algorithms, which are derived from information theory, correctly score nodes (assessed using independent data) and are robust to errors. However, in cases where nodes “form opinions” about other nodes using indirect information, like reputation, depth algorithms, like Eigenvector Centrality, are required. One caveat is that Eigenvector Centrality is not robust to error unless the network is transitive or assortative. In these cases the network structure allows the depth algorithms to effectively capture breadth as well as depth. Finally, we discuss the algorithms' cognitive and computational demands. This is an important consideration in systems in which individuals use the collective opinions of others to make decisions. PMID:23874167
Singh, Baljinder; Ezziddin, Samer; Palmedo, Holger; Reinhardt, Michael; Strunk, Holger; Tüting, Thomas; Biersack, Hans-Jürgen; Ahmadzadehfar, Hojjat
2008-10-01
The objective of this study was to evaluate the role of preoperative 18F-fluorodeoxyglucose-positron emission tomography/computed tomography scanning, preoperative lymphoscintigraphy (LS), and sentinel lymph node biopsy in patients with malignant melanoma. Fifty-two patients (36 men: 16 women; mean age 55.0+/-13.0 years; median age 61 years; range 17-76 years) with malignant melanoma were selected. According to the latest version of the American Joint Committee on Cancer staging system, the disease in the study patients was initially classified as either stage I or II. The other primary tumor characteristics were mean Breslow depth=2.87 mm and median=2 mm; range 1-12.0 mm and Clarks levels III-V. None of the study patients had clinical or radiological evidence of regional lymph node metastatic disease. At least one sentinel node was identified in all patients. Preoperative LS detected a total of 111 sentinel lymph nodes (average 2.13 sentinel lymph node per patient) and demonstrated a single nodal draining basin in 38 (73%) patients and multiple (2-3 draining basins) in the remaining 14 (27%) patients. Fourteen out of the 52 patients (27%) had at least one involved sentinel node. Positron emission tomography was true positive in two patients with a sentinel node greater than 1 cm and false positive in two other patients. In this study, the detection of sentinel lymph node by LS and gamma probe had a sensitivity of 100%. In contrast, 18F-FDG-PET imaging demonstrated very low sensitivity (14.3%; 95% CI, 2.5 to 44%) and positive predictive value (50%; 95% CI, 9 to 90%) for localizing the subclinical nodal metastases. The specificity, net present value, and diagnostic accuracy were 94.7, 75, and 73%, respectively. Preoperative fluorodeoxyglucose-positron emission tomography/computed tomography imaging is not able to substitute LS/sentinel lymph node biopsy in patients at stage I or II.
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut
2015-01-01
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.
Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut
2015-10-05
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
International Space Station Node 1 is moved for leak test
NASA Technical Reports Server (NTRS)
1998-01-01
Node 1, the first element for the International Space Station, and attached Pressurized Mating Adapter-1 continue with prelaunch preparation activities at KSC's Space Station Processing Facility. Node 1 is a connecting passageway to the living and working areas of the space station. The node is being removed from the element rotation stand, or test stand, where it underwent an interim weight and center of gravity determination. (The final determination is planned to be performed prior to transporting Node 1 to the launch pad.) Now the node is being moved to the Shuttle payload transportation canister, where the doors will be closed for a two-week leak check. Node 1 is scheduled to fly on STS-88.
Understanding the influence of all nodes in a network
Lawyer, Glenn
2015-01-01
Centrality measures such as the degree, k-shell, or eigenvalue centrality can identify a network's most influential nodes, but are rarely usefully accurate in quantifying the spreading power of the vast majority of nodes which are not highly influential. The spreading power of all network nodes is better explained by considering, from a continuous-time epidemiological perspective, the distribution of the force of infection each node generates. The resulting metric, the expected force, accurately quantifies node spreading power under all primary epidemiological models across a wide range of archetypical human contact networks. When node power is low, influence is a function of neighbor degree. As power increases, a node's own degree becomes more important. The strength of this relationship is modulated by network structure, being more pronounced in narrow, dense networks typical of social networking and weakening in broader, looser association networks such as the Internet. The expected force can be computed independently for individual nodes, making it applicable for networks whose adjacency matrix is dynamic, not well specified, or overwhelmingly large. PMID:25727453
Performance Evaluation of AODV with Blackhole Attack
NASA Astrophysics Data System (ADS)
Dara, Karuna
2010-11-01
A Mobile Ad Hoc Network (MANET) is a temporary network set up by a wireless mobile computers moving arbitrary in the places that have no network infrastructure. These nodes maintain connectivity in a decentralized manner. Since the nodes communicate with each other, they cooperate by forwarding data packets to other nodes in the network. Thus the nodes find a path to the destination node using routing protocols. However, due to security vulnerabilities of the routing protocols, mobile ad-hoc networks are unprotected to attacks of the malicious nodes. One of these attacks is the Black Hole Attack against network integrity absorbing all data packets in the network. Since the data packets do not reach the destination node on account of this attack, data loss will occur. In this paper, we simulated the black hole attack in various mobile ad-hoc network scenarios using AODV routing protocol of MANET and have tried to find a effect if number of nodes are increased with increase in malicious nodes.
Solving the vehicle routing problem by a hybrid meta-heuristic algorithm
NASA Astrophysics Data System (ADS)
Yousefikhoshbakht, Majid; Khorram, Esmaile
2012-08-01
The vehicle routing problem (VRP) is one of the most important combinational optimization problems that has nowadays received much attention because of its real application in industrial and service problems. The VRP involves routing a fleet of vehicles, each of them visiting a set of nodes such that every node is visited by exactly one vehicle only once. So, the objective is to minimize the total distance traveled by all the vehicles. This paper presents a hybrid two-phase algorithm called sweep algorithm (SW) + ant colony system (ACS) for the classical VRP. At the first stage, the VRP is solved by the SW, and at the second stage, the ACS and 3-opt local search are used for improving the solutions. Extensive computational tests on standard instances from the literature confirm the effectiveness of the presented approach.
Kaseda, Kaoru; Asakura, Keisuke; Kazama, Akio; Ozawa, Yukihiko
2016-12-01
Lymph nodes in patients with non-small cell lung cancer (NSCLC) are often staged using integrated 18F-fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT). However, this modality has limited ability to detect micrometastases. We aimed to define risk factors for occult lymph node metastasis in patients with clinical stage I NSCLC diagnosed by preoperative integrated FDG-PET/CT. We retrospectively reviewed the records of 246 patients diagnosed with clinical stage I NSCLC based on integrated FDG-PET/CT between April 2007 and May 2015. All patients were treated by complete surgical resection. The prevalence of occult lymph node metastasis in patients with clinical stage I NSCLC was analysed according to clinicopathological factors. Risk factors for occult lymph node metastasis were defined using univariate and multivariate analyses. Occult lymph node metastasis was detected in 31 patients (12.6 %). Univariate analysis revealed CEA (P = 0.04), SUV max of the primary tumour (P = 0.031), adenocarcinoma (P = 0.023), tumour size (P = 0.002) and pleural invasion (P = 0.046) as significant predictors of occult lymph node metastasis. Multivariate analysis selected SUV max of the primary tumour (P = 0.049), adenocarcinoma (P = 0.003) and tumour size (P = 0.019) as independent predictors of occult lymph node metastasis. The SUV max of the primary tumour, adenocarcinoma and tumour size were risk factors for occult lymph node metastasis in patients with NSCLC diagnosed as clinical stage I by preoperative integrated FDG-PET/CT. These findings would be helpful in selecting candidates for mediastinoscopy or endobronchial ultrasound-guided transbronchial needle aspiration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakajima, Naomi, E-mail: haruhi0321@gmail.com; Department of Radiology, Ehime University, Ehime; Kataoka, Masaaki
Purpose: To determine whether volume-based parameters on pretreatment {sup 18}F-fluorodeoxyglucose positron emission tomography/computed tomography in breast cancer patients treated with mastectomy without adjuvant radiation therapy are predictive of recurrence. Methods and Materials: We retrospectively analyzed 93 patients with 1 to 3 positive axillary nodes after surgery, who were studied with {sup 18}F-fluorodeoxyglucose positron emission tomography/computed tomography for initial staging. We evaluated the relationship between positron emission tomography parameters, including the maximum standardized uptake value, metabolic tumor volume (MTV), and total lesion glycolysis (TLG), and clinical outcomes. Results: The median follow-up duration was 45 months. Recurrence was observed in 11 patients.more » Metabolic tumor volume and TLG were significantly related to tumor size, number of involved nodes, nodal ratio, nuclear grade, estrogen receptor (ER) status, and triple negativity (TN) (all P values were <.05). In receiver operating characteristic curve analysis, MTV and TLG showed better predictive performance than tumor size, ER status, or TN (area under the curve: 0.85, 0.86, 0.79, 0.74, and 0.74, respectively). On multivariate analysis, MTV was an independent prognostic factor of locoregional recurrence-free survival (hazard ratio 34.42, 95% confidence interval 3.94-882.71, P=.0008) and disease-free survival (DFS) (hazard ratio 13.92, 95% confidence interval 2.65-103.78, P=.0018). The 3-year DFS rate was 93.8% for the lower MTV group (<53.1; n=85) and 25.0% for the higher MTV group (≥53.1; n=8; P<.0001, log–rank test). The 3-year DFS rate for patients with both ER-positive status and MTV <53.1 was 98.2%; and for those with ER-negative status and MTV ≥53.1 it was 25.0% (P<.0001). Conclusions: Volume-based parameters improve recurrence prediction in postmastectomy breast cancer patients with 1 to 3 positive nodes. The addition of MTV to ER status or TN has potential benefits to identify a subgroup at higher risk for recurrence.« less
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.