multiple compute nodes: Topics by Science.gov

Sample records for multiple compute nodes

Methods and apparatus using commutative error detection values for fault isolation in multiple node computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Almasi, Gheorghe; Blumrich, Matthias Augustin; Chen, Dong

Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for--example, checksums--to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored inmore » memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.« less
Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

DOEpatents

Gooding, Thomas Michael; McCarthy, Patrick Joseph

2010-03-02

A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Multiple node remote messaging

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Ohmacht, Martin; Salapura, Valentina; Steinmacher-Burow, Burkhard; Vranas, Pavlos

2010-08-31

A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
Anatomical classification of breast sentinel lymph nodes using computed tomography-lymphography.

PubMed

Fujita, Tamaki; Miura, Hiroyuki; Seino, Hiroko; Ono, Shuichi; Nishi, Takashi; Nishimura, Akimasa; Hakamada, Kenichi; Aoki, Masahiko

2018-05-03

To evaluate the anatomical classification and location of breast sentinel lymph nodes, preoperative computed tomography-lymphography examinations were retrospectively reviewed for sentinel lymph nodes in 464 cases clinically diagnosed with node-negative breast cancer between July 2007 and June 2016. Anatomical classification was performed based on the numbers of lymphatic routes and sentinel lymph nodes, the flow direction of lymphatic routes, and the location of sentinel lymph nodes. Of the 464 cases reviewed, anatomical classification could be performed in 434 (93.5 %). The largest number of cases showed single route/single sentinel lymph node (n = 296, 68.2 %), followed by multiple routes/multiple sentinel lymph nodes (n = 59, 13.6 %), single route/multiple sentinel lymph nodes (n = 53, 12.2 %), and multiple routes/single sentinel lymph node (n = 26, 6.0 %). Classification based on the flow direction of lymphatic routes showed that 429 cases (98.8 %) had outward flow on the superficial fascia toward axillary lymph nodes, whereas classification based on the height of sentinel lymph nodes showed that 323 cases (74.4 %) belonged to the upper pectoral group of axillary lymph nodes. There was wide variation in the number of lymphatic routes and their branching patterns and in the number, location, and direction of flow of sentinel lymph nodes. It is clinically very important to preoperatively understand the anatomical morphology of lymphatic routes and sentinel lymph nodes for optimal treatment of breast cancer, and computed tomography-lymphography is suitable for this purpose.
Robust Routing Protocol For Digital Messages

NASA Technical Reports Server (NTRS)

Marvit, Maclen

1994-01-01

Refinement of ditigal-message-routing protocol increases fault tolerance of polled networks. AbNET-3 is latest of generic AbNET protocols for transmission of messages among computing nodes. AbNET concept described in "Multiple-Ring Digital Communication Network" (NPO-18133). Specifically aimed at increasing fault tolerance of network in broadcast mode, in which one node broadcasts message to and receives responses from all other nodes. Communication in network of computers maintained even when links fail.
DMA engine for repeating communication patterns

DOEpatents

Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard; Vranas, Pavlos

2010-09-21

A parallel computer system is constructed as a network of interconnected compute nodes to operate a global message-passing application for performing communications across the network. Each of the compute nodes includes one or more individual processors with memories which run local instances of the global message-passing application operating at each compute node to carry out local processing operations independent of processing operations carried out at other compute nodes. Each compute node also includes a DMA engine constructed to interact with the application via Injection FIFO Metadata describing multiple Injection FIFOs where each Injection FIFO may containing an arbitrary number of message descriptors in order to process messages with a fixed processing overhead irrespective of the number of message descriptors included in the Injection FIFO.
Method for simultaneous overlapped communications between neighboring processors in a multiple

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1991-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Near real-time traffic routing

NASA Technical Reports Server (NTRS)

Yang, Chaowei (Inventor); Xie, Jibo (Inventor); Zhou, Bin (Inventor); Cao, Ying (Inventor)

2012-01-01

A near real-time physical transportation network routing system comprising: a traffic simulation computing grid and a dynamic traffic routing service computing grid. The traffic simulator produces traffic network travel time predictions for a physical transportation network using a traffic simulation model and common input data. The physical transportation network is divided into a multiple sections. Each section has a primary zone and a buffer zone. The traffic simulation computing grid includes multiple of traffic simulation computing nodes. The common input data includes static network characteristics, an origin-destination data table, dynamic traffic information data and historical traffic data. The dynamic traffic routing service computing grid includes multiple dynamic traffic routing computing nodes and generates traffic route(s) using the traffic network travel time predictions.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-11-16

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.
Scalable computing for evolutionary genomics.

PubMed

Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

2012-01-01

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Architecture and method for a burst buffer using flash technology

DOEpatents

Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing-bung

2016-03-15

A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.

PubMed

Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania

2015-01-01

This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
Global interrupt and barrier networks

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.

2008-10-28

A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Wide-area-distributed storage system for a multimedia database

NASA Astrophysics Data System (ADS)

Ueno, Masahiro; Kinoshita, Shigechika; Kuriki, Makato; Murata, Setsuko; Iwatsu, Shigetaro

1998-12-01

We have developed a wide-area-distribution storage system for multimedia databases, which minimizes the possibility of simultaneous failure of multiple disks in the event of a major disaster. It features a RAID system, whose member disks are spatially distributed over a wide area. Each node has a device, which includes the controller of the RAID and the controller of the member disks controlled by other nodes. The devices in the node are connected to a computer, using fiber optic cables and communicate using fiber-channel technology. Any computer at a node can utilize multiple devices connected by optical fibers as a single 'virtual disk.' The advantage of this system structure is that devices and fiber optic cables are shared by the computers. In this report, we first described our proposed system, and a prototype was used for testing. We then discussed its performance; i.e., how to read and write throughputs are affected by data-access delay, the RAID level, and queuing.
KeyWare: an open wireless distributed computing environment

NASA Astrophysics Data System (ADS)

Shpantzer, Isaac; Schoenfeld, Larry; Grindahl, Merv; Kelman, Vladimir

1995-12-01

Deployment of distributed applications in the wireless domain lack equivalent tools, methodologies, architectures, and network management that exist in LAN based applications. A wireless distributed computing environment (KeyWareTM) based on intelligent agents within a multiple client multiple server scheme was developed to resolve this problem. KeyWare renders concurrent application services to wireline and wireless client nodes encapsulated in multiple paradigms such as message delivery, database access, e-mail, and file transfer. These services and paradigms are optimized to cope with temporal and spatial radio coverage, high latency, limited throughput and transmission costs. A unified network management paradigm for both wireless and wireline facilitates seamless extensions of LAN- based management tools to include wireless nodes. A set of object oriented tools and methodologies enables direct asynchronous invocation of agent-based services supplemented by tool-sets matched to supported KeyWare paradigms. The open architecture embodiment of KeyWare enables a wide selection of client node computing platforms, operating systems, transport protocols, radio modems and infrastructures while maintaining application portability.
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

PubMed Central

Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut

2015-01-01

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.

PubMed

Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut

2015-10-05

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
An Authentication Protocol for Future Sensor Networks.

PubMed

Bilal, Muhammad; Kang, Shin-Gak

2017-04-28

Authentication is one of the essential security services in Wireless Sensor Networks (WSNs) for ensuring secure data sessions. Sensor node authentication ensures the confidentiality and validity of data collected by the sensor node, whereas user authentication guarantees that only legitimate users can access the sensor data. In a mobile WSN, sensor and user nodes move across the network and exchange data with multiple nodes, thus experiencing the authentication process multiple times. The integration of WSNs with Internet of Things (IoT) brings forth a new kind of WSN architecture along with stricter security requirements; for instance, a sensor node or a user node may need to establish multiple concurrent secure data sessions. With concurrent data sessions, the frequency of the re-authentication process increases in proportion to the number of concurrent connections. Moreover, to establish multiple data sessions, it is essential that a protocol participant have the capability of running multiple instances of the protocol run, which makes the security issue even more challenging. The currently available authentication protocols were designed for the autonomous WSN and do not account for the above requirements. Hence, ensuring a lightweight and efficient authentication protocol has become more crucial. In this paper, we present a novel, lightweight and efficient key exchange and authentication protocol suite called the Secure Mobile Sensor Network (SMSN) Authentication Protocol. In the SMSN a mobile node goes through an initial authentication procedure and receives a re-authentication ticket from the base station. Later a mobile node can use this re-authentication ticket when establishing multiple data exchange sessions and/or when moving across the network. This scheme reduces the communication and computational complexity of the authentication process. We proved the strength of our protocol with rigorous security analysis (including formal analysis using the BAN-logic) and simulated the SMSN and previously proposed schemes in an automated protocol verifier tool. Finally, we compared the computational complexity and communication cost against well-known authentication protocols.

An Authentication Protocol for Future Sensor Networks

PubMed Central

Bilal, Muhammad; Kang, Shin-Gak

2017-01-01

Authentication is one of the essential security services in Wireless Sensor Networks (WSNs) for ensuring secure data sessions. Sensor node authentication ensures the confidentiality and validity of data collected by the sensor node, whereas user authentication guarantees that only legitimate users can access the sensor data. In a mobile WSN, sensor and user nodes move across the network and exchange data with multiple nodes, thus experiencing the authentication process multiple times. The integration of WSNs with Internet of Things (IoT) brings forth a new kind of WSN architecture along with stricter security requirements; for instance, a sensor node or a user node may need to establish multiple concurrent secure data sessions. With concurrent data sessions, the frequency of the re-authentication process increases in proportion to the number of concurrent connections. Moreover, to establish multiple data sessions, it is essential that a protocol participant have the capability of running multiple instances of the protocol run, which makes the security issue even more challenging. The currently available authentication protocols were designed for the autonomous WSN and do not account for the above requirements. Hence, ensuring a lightweight and efficient authentication protocol has become more crucial. In this paper, we present a novel, lightweight and efficient key exchange and authentication protocol suite called the Secure Mobile Sensor Network (SMSN) Authentication Protocol. In the SMSN a mobile node goes through an initial authentication procedure and receives a re-authentication ticket from the base station. Later a mobile node can use this re-authentication ticket when establishing multiple data exchange sessions and/or when moving across the network. This scheme reduces the communication and computational complexity of the authentication process. We proved the strength of our protocol with rigorous security analysis (including formal analysis using the BAN-logic) and simulated the SMSN and previously proposed schemes in an automated protocol verifier tool. Finally, we compared the computational complexity and communication cost against well-known authentication protocols. PMID:28452937
High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

NASA Astrophysics Data System (ADS)

Goto, Hiroyuki

This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by employing bandwidth shells at areas of overutilization

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-04-27

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.
Evaluating Implementations of Service Oriented Architecture for Sensor Network via Simulation

DTIC Science & Technology

2011-04-01

Subject: COMPUTER SCIENCE Approved: Boleslaw Szymanski , Thesis Adviser Rensselaer Polytechnic Institute Troy, New York April 2011 (For Graduation May 2011...simulation supports distributed and centralized composition with a type hierarchy and multiple -service statically-located nodes in a 2-dimensional space...distributed and centralized composition with a type hierarchy and multiple -service statically-located nodes in a 2-dimensional space. The second simulation
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Breast sentinel lymph node navigation with three-dimensional computed tomography-lymphography: a 12-year study.

PubMed

Yamamoto, Shigeru; Suga, Kazuyoshi; Maeda, Kazunari; Maeda, Noriko; Yoshimura, Kiyoshi; Oka, Masaaki

2016-05-01

To evaluate the utility of three-dimensional (3D) computed tomography (CT)-lymphography (LG) breast sentinel lymph node navigation in our institute. Between 2002 and 2013, we preoperatively identified sentinel lymph nodes (SLNs) in 576 clinically node-negative breast cancer patients with T1 and T2 breast cancer using 3D CT-LG method. SLN biopsy (SLNB) was performed in 557 of 576 patients using both the images of 3D CT-LG for guidance and the blue dye method. Using 3D CT-LG, SLNs were visualized in 569 (99%) of 576 patients. Of 569 patients, both lymphatic draining ducts and SLNs from the peritumoral and periareolar areas were visualized in 549 (96%) patients. Only SLNs without lymphatic draining ducts were visualized in 20 patients. Drainage lymphatic pathways visualized with 3D CT-LG (549 cases) were classified into four patterns: single route/single SLN (355 cases, 65%), multiple routes/single SLN (59 cases, 11%) single route/multiple SLNs (62 cases, 11%) and multiple routes/multiple SLNs (73 cases, 13%). SLNs were detected in 556 (99.8%) of 557 patients during SLNB. CT-LG is useful for preoperative visualization of SLNs and breast lymphatic draining routes. This preoperative method should contribute greatly to the easy detection of SLNs during SLNB.
Architecture for WSN Nodes Integration in Context Aware Systems Using Semantic Messages

NASA Astrophysics Data System (ADS)

Larizgoitia, Iker; Muguira, Leire; Vazquez, Juan Ignacio

Wireless sensor networks (WSN) are becoming extremely popular in the development of context aware systems. Traditionally WSN have been focused on capturing data, which was later analyzed and interpreted in a server with more computational power. In this kind of scenario the problem of representing the sensor information needs to be addressed. Every node in the network might have different sensors attached; therefore their correspondent packet structures will be different. The server has to be aware of the meaning of every single structure and data in order to be able to interpret them. Multiple sensors, multiple nodes, multiple packet structures (and not following a standard format) is neither scalable nor interoperable. Context aware systems have solved this problem with the use of semantic technologies. They provide a common framework to achieve a standard definition of any domain. Nevertheless, these representations are computationally expensive, so a WSN cannot afford them. The work presented in this paper tries to bridge the gap between the sensor information and its semantic representation, by defining a simple architecture that enables the definition of this information natively in a semantic way, achieving the integration of the semantic information in the network packets. This will have several benefits, the most important being the possibility of promoting every WSN node to a real semantic information source.
Parallel scalability of Hartree-Fock calculations

NASA Astrophysics Data System (ADS)

Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.

2015-03-01

Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
Algorithm implementation on the Navier-Stokes computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krist, S.E.; Zang, T.A.

1987-03-01

The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Algorithm implementation on the Navier-Stokes computer

NASA Technical Reports Server (NTRS)

Krist, Steven E.; Zang, Thomas A.

1987-01-01

The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Calibrated birth-death phylogenetic time-tree priors for bayesian inference.

PubMed

Heled, Joseph; Drummond, Alexei J

2015-05-01

Here we introduce a general class of multiple calibration birth-death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of "calibrated nodes" and "uncalibrated nodes" such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth-death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Linear Transceiver Design for Interference Alignment: Complexity and Computation

DTIC Science & Technology

2010-07-01

restriction on the choice of beamforming vector of node b. Thus, for any fixed transmit node b in H , there are multiple restriction sets, each...signal space can be chosen. The receive nodes in H can achieve interference alignment if and only if these restricted sets of one-dimensional signal...total number of restriction sets is at most linear in the number of edges in H and each restriction set contains at most two one-dimensional
Sentinel nodes identified by computed tomography-lymphography accurately stage the axilla in patients with breast cancer

PubMed Central

2013-01-01

Background Sentinel node biopsy often results in the identification and removal of multiple nodes as sentinel nodes, although most of these nodes could be non-sentinel nodes. This study investigated whether computed tomography-lymphography (CT-LG) can distinguish sentinel nodes from non-sentinel nodes and whether sentinel nodes identified by CT-LG can accurately stage the axilla in patients with breast cancer. Methods This study included 184 patients with breast cancer and clinically negative nodes. Contrast agent was injected interstitially. The location of sentinel nodes was marked on the skin surface using a CT laser light navigator system. Lymph nodes located just under the marks were first removed as sentinel nodes. Then, all dyed nodes or all hot nodes were removed. Results The mean number of sentinel nodes identified by CT-LG was significantly lower than that of dyed and/or hot nodes removed (1.1 vs 1.8, p <0.0001). Twenty-three (12.5%) patients had ≥2 sentinel nodes identified by CT-LG removed, whereas 94 (51.1%) of patients had ≥2 dyed and/or hot nodes removed (p <0.0001). Pathological evaluation demonstrated that 47 (25.5%) of 184 patients had metastasis to at least one node. All 47 patients demonstrated metastases to at least one of the sentinel nodes identified by CT-LG. Conclusions CT-LG can distinguish sentinel nodes from non-sentinel nodes, and sentinel nodes identified by CT-LG can accurately stage the axilla in patients with breast cancer. Successful identification of sentinel nodes using CT-LG may facilitate image-based diagnosis of metastasis, possibly leading to the omission of sentinel node biopsy. PMID:24321242
Implementation and Characterization of Three-Dimensional Particle-in-Cell Codes on Multiple-Instruction-Multiple-Data Massively Parallel Supercomputers

NASA Technical Reports Server (NTRS)

Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.

1995-01-01

A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Multiple-Flat-Panel System Displays Multidimensional Data

NASA Technical Reports Server (NTRS)

Gundo, Daniel; Levit, Creon; Henze, Christopher; Sandstrom, Timothy; Ellsworth, David; Green, Bryan; Joly, Arthur

2006-01-01

The NASA Ames hyperwall is a display system designed to facilitate the visualization of sets of multivariate and multidimensional data like those generated in complex engineering and scientific computations. The hyperwall includes a 77 matrix of computer-driven flat-panel video display units, each presenting an image of 1,280 1,024 pixels. The term hyperwall reflects the fact that this system is a more capable successor to prior computer-driven multiple-flat-panel display systems known by names that include the generic term powerwall and the trade names PowerWall and Powerwall. Each of the 49 flat-panel displays is driven by a rack-mounted, dual-central-processing- unit, workstation-class personal computer equipped with a hig-hperformance graphical-display circuit card and with a hard-disk drive having a storage capacity of 100 GB. Each such computer is a slave node in a master/ slave computing/data-communication system (see Figure 1). The computer that acts as the master node is similar to the slave-node computers, except that it runs the master portion of the system software and is equipped with a keyboard and mouse for control by a human operator. The system utilizes commercially available master/slave software along with custom software that enables the human controller to interact simultaneously with any number of selected slave nodes. In a powerwall, a single rendering task is spread across multiple processors and then the multiple outputs are tiled into one seamless super-display. It must be noted that the hyperwall concept subsumes the powerwall concept in that a single scene could be rendered as a mosaic image on the hyperwall. However, the hyperwall offers a wider set of capabilities to serve a different purpose: The hyperwall concept is one of (1) simultaneously displaying multiple different but related images, and (2) providing means for composing and controlling such sets of images. In place of elaborate software or hardware crossbar switches, the hyperwall concept substitutes reliance on the human visual system for integration, synthesis, and discrimination of patterns in complex and high-dimensional data spaces represented by the multiple displayed images. The variety of multidimensional data sets that can be displayed on the hyperwall is practically unlimited. For example, Figure 2 shows a hyperwall display of surface pressures and streamlines from a computational simulation of airflow about an aerospacecraft at various Mach numbers and angles of attack. In this display, Mach numbers increase from left to right and angles of attack increase from bottom to top. That is, all images in the same column represent simulations at the same Mach number, while all images in the same row represent simulations at the same angle of attack. The same viewing transformations and the same mapping from surface pressure to colors were used in generating all the images.
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregatingmore » each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.« less
Distributing an executable job load file to compute nodes in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Quadrature rules with multiple nodes for evaluating integrals with strong singularities

NASA Astrophysics Data System (ADS)

Milovanovic, Gradimir V.; Spalevic, Miodrag M.

2006-05-01

We present a method based on the Chakalov-Popoviciu quadrature formula of Lobatto type, a rather general case of quadrature with multiple nodes, for approximating integrals defined by Cauchy principal values or by Hadamard finite parts. As a starting point we use the results obtained by L. Gori and E. Santi (cf. On the evaluation of Hilbert transforms by means of a particular class of Turan quadrature rules, Numer. Algorithms 10 (1995), 27-39; Quadrature rules based on s-orthogonal polynomials for evaluating integrals with strong singularities, Oberwolfach Proceedings: Applications and Computation of Orthogonal Polynomials, ISNM 131, Birkhauser, Basel, 1999, pp. 109-119). We generalize their results by using some of our numerical procedures for stable calculation of the quadrature formula with multiple nodes of Gaussian type and proposed methods for estimating the remainder term in such type of quadrature formulae. Numerical examples, illustrations and comparisons are also shown.
An elementary quantum network using robust nuclear spin qubits in diamond

NASA Astrophysics Data System (ADS)

Kalb, Norbert; Reiserer, Andreas; Humphreys, Peter; Blok, Machiel; van Bemmelen, Koen; Twitchen, Daniel; Markham, Matthew; Taminiau, Tim; Hanson, Ronald

Quantum registers containing multiple robust qubits can form the nodes of future quantum networks for computation and communication. Information storage within such nodes must be resilient to any type of local operation. Here we demonstrate multiple robust memories by employing five nuclear spins adjacent to a nitrogen-vacancy defect centre in diamond. We characterize the storage of quantum superpositions and their resilience to entangling attempts with the electron spin of the defect centre. The storage fidelity is found to be limited by the probabilistic electron spin reset after failed entangling attempts. Control over multiple memories is then utilized to encode states in decoherence protected subspaces with increased robustness. Furthermore we demonstrate memory control in two optically linked network nodes and characterize the storage capabilities of both memories in terms of the process fidelity with the identity. These results pave the way towards multi-qubit quantum algorithms in a remote network setting.
GEANT4 distributed computing for compact clusters

NASA Astrophysics Data System (ADS)

Harrawood, Brian P.; Agasthya, Greeshma A.; Lakshmanan, Manu N.; Raterman, Gretchen; Kapadia, Anuj J.

2014-11-01

A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C++ class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client-server work flow model to specify the parameters for each individual run of the simulation. The new g4DistributedRunManager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through-put of a single model.

Distributed Synchronization Technique for OFDMA-Based Wireless Mesh Networks Using a Bio-Inspired Algorithm

PubMed Central

Kim, Mi Jeong; Maeng, Sung Joon; Cho, Yong Soo

2015-01-01

In this paper, a distributed synchronization technique based on a bio-inspired algorithm is proposed for an orthogonal frequency division multiple access (OFDMA)-based wireless mesh network (WMN) with a time difference of arrival. The proposed time- and frequency-synchronization technique uses only the signals received from the neighbor nodes, by considering the effect of the propagation delay between the nodes. It achieves a fast synchronization with a relatively low computational complexity because it is operated in a distributed manner, not requiring any feedback channel for the compensation of the propagation delays. In addition, a self-organization scheme that can be effectively used to construct 1-hop neighbor nodes is proposed for an OFDMA-based WMN with a large number of nodes. The performance of the proposed technique is evaluated with regard to the convergence property and synchronization success probability using a computer simulation. PMID:26225974
Distributed Synchronization Technique for OFDMA-Based Wireless Mesh Networks Using a Bio-Inspired Algorithm.

PubMed

Kim, Mi Jeong; Maeng, Sung Joon; Cho, Yong Soo

2015-07-28

In this paper, a distributed synchronization technique based on a bio-inspired algorithm is proposed for an orthogonal frequency division multiple access (OFDMA)-based wireless mesh network (WMN) with a time difference of arrival. The proposed time- and frequency-synchronization technique uses only the signals received from the neighbor nodes, by considering the effect of the propagation delay between the nodes. It achieves a fast synchronization with a relatively low computational complexity because it is operated in a distributed manner, not requiring any feedback channel for the compensation of the propagation delays. In addition, a self-organization scheme that can be effectively used to construct 1-hop neighbor nodes is proposed for an OFDMA-based WMN with a large number of nodes. The performance of the proposed technique is evaluated with regard to the convergence property and synchronization success probability using a computer simulation.
Portable multi-node LQCD Monte Carlo simulations using OpenACC

NASA Astrophysics Data System (ADS)

Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
Using Computer-Assisted Multiple Representations in Learning Geometry Proofs

ERIC Educational Resources Information Center

Wong, Wing-Kwong; Yin, Sheng-Kai; Yang, Hsi-Hsun; Cheng, Ying-Hao

2011-01-01

Geometry theorem proving involves skills that are difficult to learn. Instead of working with abstract and complicated representations, students might start with concrete, graphical representations. A proof tree is a graphical representation of a formal proof, with each node representing a proposition or given conditions. A computer-assisted…
ALMA Correlator Real-Time Data Processor

NASA Astrophysics Data System (ADS)

Pisano, J.; Amestica, R.; Perez, J.

2005-10-01

The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
A Massively Parallel Code for Polarization Calculations

NASA Astrophysics Data System (ADS)

Akiyama, Shizuka; Höflich, Peter

2001-03-01

We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the nodes during execution

DOEpatents

Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E

2012-10-16

Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions.
Synchronizing compute node time bases in a parallel computer

DOEpatents

Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

2015-01-27

Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Synchronizing compute node time bases in a parallel computer

DOEpatents

Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip

2014-12-30

Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
A high performance parallel algorithm for 1-D FFT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, R.C.; Gustavson, F.G.; Zubair, M.

1994-12-31

In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
Providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

DOEpatents

Archer, Charles J; Faraj, Ahmad A; Inglett, Todd A; Ratterman, Joseph D

2013-04-16

Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selected link to the adjacent compute node connected to the compute node through the selected link.
Providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J.; Faraj, Daniel A.; Inglett, Todd A.

Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selectedmore » link to the adjacent compute node connected to the compute node through the selected link.« less
The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Performing process migration with allreduce operations

DOEpatents

Archer, Charles Jens; Peters, Amanda; Wallenfelt, Brian Paul

2010-12-14

Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the first process. A third allreduce operation generates a third result and uses the first result from first compute node, the second result from the second compute node, and zeros from others. The first compute node replaces the first result with the third result, which is the second process.
Announcing Supercomputer Summit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wells, Jack; Bland, Buddy; Nichols, Jeff

Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe. Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte ofmore » coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the world’s most pressing challenges.« less
Fault tolerant hypercube computer system architecture

NASA Technical Reports Server (NTRS)

Madan, Herb S. (Inventor); Chow, Edward (Inventor)

1989-01-01

A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node operably connected to the first multiplexer whereby the second watch dog node can selectively communicate with individual ones of the computing nodes through the second and fourth networks. The branch is completed by a first load balancing node; and a second multiplexer connected between the first load balancing node and the first and second watch dog nodes, allowing the first load balancing node to selectively communicate with the first and second watch dog nodes.
Controlling data transfers from an origin compute node to a target compute node

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2011-06-21

Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication.
Locating hardware faults in a data communications network of a parallel computer

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-01-12

Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-01-10

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Identifying failure in a tree network of a parallel computer

DOEpatents

Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

2010-08-24

Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.

1991-01-01

A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
User's manual for CNVUFAC, the general dynamics heat-transfer radiation view factor program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wong, R. L.

CNVUFAC, the General Dynamics heat-transfer radiation veiw factor program, has been adapted for use on the LLL CDC 7600 computer system. The input and output have been modified, and a node incrementing logic was included to make the code compatible with the TRUMP thermal analyzer and related codes. The program performs the multiple integration necessary to evaluate the geometric black-body radiaton node to node view factors. Card image output that contains node number and view factor information is generated for input into the related program GRAY. Program GRAY is then used to include the effects of gray-body emissivities and multiplemore » reflections, generating the effective gray-body view factors usable in TRUMP. CNVUFAC uses an elemental area summation scheme to evaluate the multiple integrals. The program permits shadowing and self-shadowing. The basic configuration shapes that can be considered are cylinders, cones, spheres, ellipsoids, flat plates, disks, toroids, and polynomials of revolution. Portions of these shapes can also be considered.« less
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

DOEpatents

Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.

2012-10-23

Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.
Mediastinal lymph node detection and station mapping on chest CT using spatial priors and random forest

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn

2016-07-15

Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifiermore » for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.« less
Scalable and responsive event processing in the cloud

PubMed Central

Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul

2013-01-01

Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164
An MPA-IO interface to HPSS

NASA Technical Reports Server (NTRS)

Jones, Terry; Mark, Richard; Martin, Jeanne; May, John; Pierce, Elsie; Stanberry, Linda

1996-01-01

This paper describes an implementation of the proposed MPI-IO (Message Passing Interface - Input/Output) standard for parallel I/O. Our system uses third-party transfer to move data over an external network between the processors where it is used and the I/O devices where it resides. Data travels directly from source to destination, without the need for shuffling it among processors or funneling it through a central node. Our distributed server model lets multiple compute nodes share the burden of coordinating data transfers. The system is built on the High Performance Storage System (HPSS), and a prototype version runs on a Meiko CS-2 parallel computer.
A method to approximate a closest loadability limit using multiple load flow solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yorino, Naoto; Harada, Shigemi; Cheng, Haozhong

A new method is proposed to approximate a closest loadability limit (CLL), or closest saddle node bifurcation point, using a pair of multiple load flow solutions. More strictly, the obtainable points by the method are the stationary points including not only CLL but also farthest and saddle points. An operating solution and a low voltage load flow solution are used to efficiently estimate the node injections at a CLL as well as the left and right eigenvectors corresponding to the zero eigenvalue of the load flow Jacobian. They can be used in monitoring loadability margin, in identification of weak spotsmore » in a power system and in the examination of an optimal control against voltage collapse. Most of the computation time of the proposed method is taken in calculating the load flow solution pair. The remaining computation time is less than that of an ordinary load flow.« less
Identifying a largest logical plane from a plurality of logical planes formed of compute nodes of a subcommunicator in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, Kristan D.; Faraj, Daniel A.

In a parallel computer, a largest logical plane from a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: identifying, by each compute node of the subcommunicator, all logical planes that include the compute node; calculating, by each compute node for each identified logical plane that includes the compute node, an area of the identified logical plane; initiating, by a root node of the subcommunicator, a gather operation; receiving, by the root node from each compute node of the subcommunicator, each node's calculated areas as contribution data to the gather operation; and identifying, bymore » the root node in dependence upon the received calculated areas, a logical plane of the subcommunicator having the greatest area.« less
A Decentralized Eigenvalue Computation Method for Spectrum Sensing Based on Average Consensus

NASA Astrophysics Data System (ADS)

Mohammadi, Jafar; Limmer, Steffen; Stańczak, Sławomir

2016-07-01

This paper considers eigenvalue estimation for the decentralized inference problem for spectrum sensing. We propose a decentralized eigenvalue computation algorithm based on the power method, which is referred to as generalized power method GPM; it is capable of estimating the eigenvalues of a given covariance matrix under certain conditions. Furthermore, we have developed a decentralized implementation of GPM by splitting the iterative operations into local and global computation tasks. The global tasks require data exchange to be performed among the nodes. For this task, we apply an average consensus algorithm to efficiently perform the global computations. As a special case, we consider a structured graph that is a tree with clusters of nodes at its leaves. For an accelerated distributed implementation, we propose to use computation over multiple access channel (CoMAC) as a building block of the algorithm. Numerical simulations are provided to illustrate the performance of the two algorithms.
Configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks

DOEpatents

Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E.

2010-03-02

Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root.
Collectively loading an application in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Paging memory from random access memory to backing storage in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

2013-05-21

Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
Pacing a data transfer operation between compute nodes on a parallel computer

DOEpatents

Blocksome, Michael A [Rochester, MN

2011-09-13

Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (`DMA`) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine.
Dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job

DOE Office of Scientific and Technical Information (OSTI.GOV)

Budnik, Thomas A; Knudson, Brant L; Megerian, Mark G

Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of computemore » nodes running the re-launched job.« less
Link failure detection in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

2010-11-09

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.
Administering truncated receive functions in a parallel messaging interface

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-12-09

Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

Transient Solid Dynamics Simulations on the Sandia/Intel Teraflop Computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Attaway, S.; Brown, K.; Gardner, D.

1997-12-31

Transient solid dynamics simulations are among the most widely used engineering calculations. Industrial applications include vehicle crashworthiness studies, metal forging, and powder compaction prior to sintering. These calculations are also critical to defense applications including safety studies and weapons simulations. The practical importance of these calculations and their computational intensiveness make them natural candidates for parallelization. This has proved to be difficult, and existing implementations fail to scale to more than a few dozen processors. In this paper we describe our parallelization of PRONTO, Sandia`s transient solid dynamics code, via a novel algorithmic approach that utilizes multiple decompositions for differentmore » key segments of the computations, including the material contact calculation. This latter calculation is notoriously difficult to perform well in parallel, because it involves dynamically changing geometry, global searches for elements in contact, and unstructured communications among the compute nodes. Our approach scales to at least 3600 compute nodes of the Sandia/Intel Teraflop computer (the largest set of nodes to which we have had access to date) on problems involving millions of finite elements. On this machine we can simulate models using more than ten- million elements in a few tenths of a second per timestep, and solve problems more than 3000 times faster than a single processor Cray Jedi.« less
HERMIES-3: A step toward autonomous mobility, manipulation, and perception

NASA Technical Reports Server (NTRS)

Weisbin, C. R.; Burks, B. L.; Einstein, J. R.; Feezell, R. R.; Manges, W. W.; Thompson, D. H.

1989-01-01

HERMIES-III is an autonomous robot comprised of a seven degree-of-freedom (DOF) manipulator designed for human scale tasks, a laser range finder, a sonar array, an omni-directional wheel-driven chassis, multiple cameras, and a dual computer system containing a 16-node hypercube expandable to 128 nodes. The current experimental program involves performance of human-scale tasks (e.g., valve manipulation, use of tools), integration of a dexterous manipulator and platform motion in geometrically complex environments, and effective use of multiple cooperating robots (HERMIES-IIB and HERMIES-III). The environment in which the robots operate has been designed to include multiple valves, pipes, meters, obstacles on the floor, valves occluded from view, and multiple paths of differing navigation complexity. The ongoing research program supports the development of autonomous capability for HERMIES-IIB and III to perform complex navigation and manipulation under time constraints, while dealing with imprecise sensory information.
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

NASA Astrophysics Data System (ADS)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
Announcing Supercomputer Summit

ScienceCinema

Wells, Jack; Bland, Buddy; Nichols, Jeff; Hack, Jim; Foertter, Fernanda; Hagen, Gaute; Maier, Thomas; Ashfaq, Moetasim; Messer, Bronson; Parete-Koon, Suzanne

2018-01-16

Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe. Summit will deliver more than five times the computational performance of Titanâs 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIAâs high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the worldâs most pressing challenges.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Identifying logical planes formed of compute nodes of a subcommunicator in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, Kristan D.; Faraj, Daniel

In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension,more » all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.« less
Constructing a logical, regular axis topology from an irregular topology

DOEpatents

Faraj, Daniel A.

2014-07-22

Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Constructing a logical, regular axis topology from an irregular topology

DOEpatents

Faraj, Daniel A.

2014-07-01

Constructing a logical regular topology from an irregular topology including, for each axial dimension and recursively, for each compute node in a subcommunicator until returning to a first node: adding to a logical line of the axial dimension a neighbor specified in a nearest neighbor list; calling the added compute node; determining, by the called node, whether any neighbor in the node's nearest neighbor list is available to add to the logical line; if a neighbor in the called compute node's nearest neighbor list is available to add to the logical line, adding, by the called compute node to the logical line, any neighbor in the called compute node's nearest neighbor list for the axial dimension not already added to the logical line; and, if no neighbor in the called compute node's nearest neighbor list is available to add to the logical line, returning to the calling compute node.
Identifying messaging completion in a parallel computer by checking for change in message received and transmitted count at each node

DOEpatents

Archer, Charles J [Rochester, MN; Hardwick, Camesha R [Fayetteville, NC; McCarthy, Patrick J [Rochester, MN; Wallenfelt, Brian P [Eden Prairie, MN

2009-06-23

Methods, parallel computers, and products are provided for identifying messaging completion on a parallel computer. The parallel computer includes a plurality of compute nodes, the compute nodes coupled for data communications by at least two independent data communications networks including a binary tree data communications network optimal for collective operations that organizes the nodes as a tree and a torus data communications network optimal for point to point operations that organizes the nodes as a torus. Embodiments include reading all counters at each node of the torus data communications network; calculating at each node a current node value in dependence upon the values read from the counters at each node; and determining for all nodes whether the current node value for each node is the same as a previously calculated node value for each node. If the current node is the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is complete and if the current node is not the same as the previously calculated node value for all nodes of the torus data communications network, embodiments include determining that messaging is currently incomplete.
Method and system for knowledge discovery using non-linear statistical analysis and a 1st and 2nd tier computer program

DOEpatents

Hively, Lee M [Philadelphia, TN

2011-07-12

The invention relates to a method and apparatus for simultaneously processing different sources of test data into informational data and then processing different categories of informational data into knowledge-based data. The knowledge-based data can then be communicated between nodes in a system of multiple computers according to rules for a type of complex, hierarchical computer system modeled on a human brain.
PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)

NASA Astrophysics Data System (ADS)

Vincenti, Henri

2016-03-01

The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.
Executing a gather operation on a parallel computer

DOEpatents

Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2012-03-20

Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.
A novel strategy for load balancing of distributed medical applications.

PubMed

Logeswaran, Rajasvaran; Chen, Li-Choo

2012-04-01

Current trends in medicine, specifically in the electronic handling of medical applications, ranging from digital imaging, paperless hospital administration and electronic medical records, telemedicine, to computer-aided diagnosis, creates a burden on the network. Distributed Service Architectures, such as Intelligent Network (IN), Telecommunication Information Networking Architecture (TINA) and Open Service Access (OSA), are able to meet this new challenge. Distribution enables computational tasks to be spread among multiple processors; hence, performance is an important issue. This paper proposes a novel approach in load balancing, the Random Sender Initiated Algorithm, for distribution of tasks among several nodes sharing the same computational object (CO) instances in Distributed Service Architectures. Simulations illustrate that the proposed algorithm produces better network performance than the benchmark load balancing algorithms-the Random Node Selection Algorithm and the Shortest Queue Algorithm, especially under medium and heavily loaded conditions.
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
High-Throughput Bit-Serial LDPC Decoder LSI Based on Multiple-Valued Asynchronous Interleaving

NASA Astrophysics Data System (ADS)

Onizawa, Naoya; Hanyu, Takahiro; Gaudet, Vincent C.

This paper presents a high-throughput bit-serial low-density parity-check (LDPC) decoder that uses an asynchronous interleaver. Since consecutive log-likelihood message values on the interleaver are similar, node computations are continuously performed by using the most recently arrived messages without significantly affecting bit-error rate (BER) performance. In the asynchronous interleaver, each message's arrival rate is based on the delay due to the wire length, so that the decoding throughput is not restricted by the worst-case latency, which results in a higher average rate of computation. Moreover, the use of a multiple-valued data representation makes it possible to multiplex control signals and data from mutual nodes, thus minimizing the number of handshaking steps in the asynchronous interleaver and eliminating the clock signal entirely. As a result, the decoding throughput becomes 1.3 times faster than that of a bit-serial synchronous decoder under a 90nm CMOS technology, at a comparable BER.
Hybrid data storage system in an HPC exascale environment

DOEpatents

Bent, John M.; Faibish, Sorin; Gupta, Uday K.; Tzelnic, Percy; Ting, Dennis P. J.

2015-08-18

A computer-executable method, system, and computer program product for managing I/O requests from a compute node in communication with a data storage system, including a first burst buffer node and a second burst buffer node, the computer-executable method, system, and computer program product comprising striping data on the first burst buffer node and the second burst buffer node, wherein a first portion of the data is communicated to the first burst buffer node and a second portion of the data is communicated to the second burst buffer node, processing the first portion of the data at the first burst buffer node, and processing the second portion of the data at the second burst buffer node.
The use of multiple time point dynamic positron emission tomography/computed tomography in patients with oral/head and neck cancer does not predictably identify metastatic cervical lymph nodes.

PubMed

Carlson, Eric R; Schaefferkoetter, Josh; Townsend, David; McCoy, J Michael; Campbell, Paul D; Long, Misty

2013-01-01

To determine whether the time course of 18-fluorine fluorodeoxyglucose (18F-FDG) activity in multiple consecutively obtained 18F-FDG positron emission tomography (PET)/computed tomography (CT) scans predictably identifies metastatic cervical adenopathy in patients with oral/head and neck cancer. It is hypothesized that the activity will increase significantly over time only in those lymph nodes harboring metastatic cancer. A prospective cohort study was performed whereby patients with oral/head and neck cancer underwent consecutive imaging at 9 time points with PET/CT from 60 to 115 minutes after injection with (18)F-FDG. The primary predictor variable was the status of the lymph nodes based on dynamic PET/CT imaging. Metastatic lymph nodes were defined as those that showed an increase greater than or equal to 10% over the baseline standard uptake values. The primary outcome variable was the pathologic status of the lymph node. A total of 2,237 lymph nodes were evaluated histopathologically in the 83 neck dissections that were performed in 74 patients. A total of 119 lymph nodes were noted to have hypermetabolic activity on the 90-minute (static) portion of the study and were able to be assessed by time points. When we compared the PET/CT time point (dynamic) data with the histopathologic analysis of the lymph nodes, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were 60.3%, 70.5%, 66.0%, 65.2%, and 65.5%, respectively. The use of dynamic PET/CT imaging does not permit the ablative surgeon to depend only on the results of the PET/CT study to determine which patients will benefit from neck dissection. As such, we maintain that surgeons should continue to rely on clinical judgment and maintain a low threshold for executing neck dissection in patients with oral/head and neck cancer, including those patients with N0 neck designations. Copyright © 2013 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Executing scatter operation to parallel computer nodes by repeatedly broadcasting content of send buffer partition corresponding to each node upon bitwise OR operation

DOEpatents

Archer, Charles J [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2009-11-06

Executing a scatter operation on a parallel computer includes: configuring a send buffer on a logical root, the send buffer having positions, each position corresponding to a ranked node in an operational group of compute nodes and for storing contents scattered to that ranked node; and repeatedly for each position in the send buffer: broadcasting, by the logical root to each of the other compute nodes on a global combining network, the contents of the current position of the send buffer using a bitwise OR operation, determining, by each compute node, whether the current position in the send buffer corresponds with the rank of that compute node, if the current position corresponds with the rank, receiving the contents and storing the contents in a reception buffer of that compute node, and if the current position does not correspond with the rank, discarding the contents.
Internode data communications in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

2013-09-03

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Internode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.

Profiling an application for power consumption during execution on a compute node

DOEpatents

Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E

2013-09-17

Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
A Hybrid Scheme for Fine-Grained Search and Access Authorization in Fog Computing Environment

PubMed Central

Xiao, Min; Zhou, Jing; Liu, Xuejiao; Jiang, Mingda

2017-01-01

In the fog computing environment, the encrypted sensitive data may be transferred to multiple fog nodes on the edge of a network for low latency; thus, fog nodes need to implement a search over encrypted data as a cloud server. Since the fog nodes tend to provide service for IoT applications often running on resource-constrained end devices, it is necessary to design lightweight solutions. At present, there is little research on this issue. In this paper, we propose a fine-grained owner-forced data search and access authorization scheme spanning user-fog-cloud for resource constrained end users. Compared to existing schemes only supporting either index encryption with search ability or data encryption with fine-grained access control ability, the proposed hybrid scheme supports both abilities simultaneously, and index ciphertext and data ciphertext are constructed based on a single ciphertext-policy attribute based encryption (CP-ABE) primitive and share the same key pair, thus the data access efficiency is significantly improved and the cost of key management is greatly reduced. Moreover, in the proposed scheme, the resource constrained end devices are allowed to rapidly assemble ciphertexts online and securely outsource most of decryption task to fog nodes, and mediated encryption mechanism is also adopted to achieve instantaneous user revocation instead of re-encrypting ciphertexts with many copies in many fog nodes. The security and the performance analysis show that our scheme is suitable for a fog computing environment. PMID:28629131
A Hybrid Scheme for Fine-Grained Search and Access Authorization in Fog Computing Environment.

PubMed

Xiao, Min; Zhou, Jing; Liu, Xuejiao; Jiang, Mingda

2017-06-17

In the fog computing environment, the encrypted sensitive data may be transferred to multiple fog nodes on the edge of a network for low latency; thus, fog nodes need to implement a search over encrypted data as a cloud server. Since the fog nodes tend to provide service for IoT applications often running on resource-constrained end devices, it is necessary to design lightweight solutions. At present, there is little research on this issue. In this paper, we propose a fine-grained owner-forced data search and access authorization scheme spanning user-fog-cloud for resource constrained end users. Compared to existing schemes only supporting either index encryption with search ability or data encryption with fine-grained access control ability, the proposed hybrid scheme supports both abilities simultaneously, and index ciphertext and data ciphertext are constructed based on a single ciphertext-policy attribute based encryption (CP-ABE) primitive and share the same key pair, thus the data access efficiency is significantly improved and the cost of key management is greatly reduced. Moreover, in the proposed scheme, the resource constrained end devices are allowed to rapidly assemble ciphertexts online and securely outsource most of decryption task to fog nodes, and mediated encryption mechanism is also adopted to achieve instantaneous user revocation instead of re-encrypting ciphertexts with many copies in many fog nodes. The security and the performance analysis show that our scheme is suitable for a fog computing environment.
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Line-plane broadcasting in a data communications network of a parallel computer

DOEpatents

Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.

2010-06-08

Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
Line-plane broadcasting in a data communications network of a parallel computer

DOEpatents

Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.

2010-11-23

Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
Computing NLTE Opacities -- Node Level Parallel Calculation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holladay, Daniel

Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.
Data driven CAN node reliability assessment for manufacturing system

NASA Astrophysics Data System (ADS)

Zhang, Leiming; Yuan, Yong; Lei, Yong

2017-01-01

The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.
Design of temperature monitoring system based on CAN bus

NASA Astrophysics Data System (ADS)

Zhang, Li

2017-10-01

The remote temperature monitoring system based on the Controller Area Network (CAN) bus is designed to collect the multi-node remote temperature. By using the STM32F103 as main controller and multiple DS18B20s as temperature sensors, the system achieves a master-slave node data acquisition and transmission based on the CAN bus protocol. And making use of the serial port communication technology to communicate with the host computer, the system achieves the function of remote temperature storage, historical data show and the temperature waveform display.
Profiling an application for power consumption during execution on a plurality of compute nodes

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.

2012-08-21

Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
Managing data from multiple disciplines, scales, and sites to support synthesis and modeling

USGS Publications Warehouse

Olson, R. J.; Briggs, J. M.; Porter, J.H.; Mah, Grant R.; Stafford, S.G.

1999-01-01

The synthesis and modeling of ecological processes at multiple spatial and temporal scales involves bringing together and sharing data from numerous sources. This article describes a data and information system model that facilitates assembling, managing, and sharing diverse data from multiple disciplines, scales, and sites to support integrated ecological studies. Cross-site scientific-domain working groups coordinate the development of data associated with their particular scientific working group, including decisions about data requirements, data to be compiled, data formats, derived data products, and schedules across the sites. The Web-based data and information system consists of nodes for each working group plus a central node that provides data access, project information, data query, and other functionality. The approach incorporates scientists and computer experts in the working groups and provides incentives for individuals to submit documented data to the data and information system.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

PubMed

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Software system for data management and distributed processing of multichannel biomedical signals.

PubMed

Franaszczuk, P J; Jouny, C C

2004-01-01

The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
Fast correspondences search in anatomical trees

NASA Astrophysics Data System (ADS)

dos Santos, Thiago R.; Gergel, Ingmar; Meinzer, Hans-Peter; Maier-Hein, Lena

2010-03-01

Registration of multiple medical images commonly comprises the steps feature extraction, correspondences search and transformation computation. In this paper, we present a new method for a fast and pose independent search of correspondences using as features anatomical trees such as the bronchial system in the lungs or the vessel system in the liver. Our approach scores the similarities between the trees' nodes (bifurcations) taking into account both, topological properties extracted from their graph representations and anatomical properties extracted from the trees themselves. The node assignment maximizes the global similarity (sum of the scores of each pair of assigned nodes), assuring that the matches are distributed throughout the trees. Furthermore, the proposed method is able to deal with distortions in the data, such as noise, motion, artifacts, and problems associated with the extraction method, such as missing or false branches. According to an evaluation on swine lung data sets, the method requires less than one second on average to compute the matching and yields a high rate of correct matches compared to state of the art work.
Broadcasting a message in a parallel computer

DOEpatents

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Distributed multiple path routing in complex networks

NASA Astrophysics Data System (ADS)

Chen, Guang; Wang, San-Xiu; Wu, Ling-Wei; Mei, Pan; Yang, Xu-Hua; Wen, Guang-Hui

2016-12-01

Routing in complex transmission networks is an important problem that has garnered extensive research interest in the recent years. In this paper, we propose a novel routing strategy called the distributed multiple path (DMP) routing strategy. For each of the O-D node pairs in a given network, the DMP routing strategy computes and stores multiple short-length paths that overlap less with each other in advance. And during the transmission stage, it rapidly selects an actual routing path which provides low transmission cost from the pre-computed paths for each transmission task, according to the real-time network transmission status information. Computer simulation results obtained for the lattice, ER random, and scale-free networks indicate that the strategy can significantly improve the anti-congestion ability of transmission networks, as well as provide favorable routing robustness against partial network failures.
Performing a global barrier operation in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-12-09

Executing computing tasks on a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M.

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, Kristan D.; Faraj, Daniel A.

In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension,more » all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.« less
Message passing with a limited number of DMA byte counters

DOEpatents

Blocksome, Michael [Rochester, MN; Chen, Dong [Croton on Hudson, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kumar, Sameer [White Plains, NY; Parker, Jeffrey J [Rochester, MN

2011-10-04

A method for passing messages in a parallel computer system constructed as a plurality of compute nodes interconnected as a network where each compute node includes a DMA engine but includes only a limited number of byte counters for tracking a number of bytes that are sent or received by the DMA engine, where the byte counters may be used in shared counter or exclusive counter modes of operation. The method includes using rendezvous protocol, a source compute node deterministically sending a request to send (RTS) message with a single RTS descriptor using an exclusive injection counter to track both the RTS message and message data to be sent in association with the RTS message, to a destination compute node such that the RTS descriptor indicates to the destination compute node that the message data will be adaptively routed to the destination node. Using one DMA FIFO at the source compute node, the RTS descriptors are maintained for rendezvous messages destined for the destination compute node to ensure proper message data ordering thereat. Using a reception counter at a DMA engine, the destination compute node tracks reception of the RTS and associated message data and sends a clear to send (CTS) message to the source node in a rendezvous protocol form of a remote get to accept the RTS message and message data and processing the remote get (CTS) by the source compute node DMA engine to provide the message data to be sent.

Multiple-Ring Digital Communication Network

NASA Technical Reports Server (NTRS)

Kirkham, Harold

1992-01-01

Optical-fiber digital communication network to support data-acquisition and control functions of electric-power-distribution networks. Optical-fiber links of communication network follow power-distribution routes. Since fiber crosses open power switches, communication network includes multiple interconnected loops with occasional spurs. At each intersection node is needed. Nodes of communication network include power-distribution substations and power-controlling units. In addition to serving data acquisition and control functions, each node acts as repeater, passing on messages to next node(s). Multiple-ring communication network operates on new AbNET protocol and features fiber-optic communication.
Preoperative 18F-FDG-PET/CT imaging and sentinel node biopsy in the detection of regional lymph node metastases in malignant melanoma.

PubMed

Singh, Baljinder; Ezziddin, Samer; Palmedo, Holger; Reinhardt, Michael; Strunk, Holger; Tüting, Thomas; Biersack, Hans-Jürgen; Ahmadzadehfar, Hojjat

2008-10-01

The objective of this study was to evaluate the role of preoperative 18F-fluorodeoxyglucose-positron emission tomography/computed tomography scanning, preoperative lymphoscintigraphy (LS), and sentinel lymph node biopsy in patients with malignant melanoma. Fifty-two patients (36 men: 16 women; mean age 55.0+/-13.0 years; median age 61 years; range 17-76 years) with malignant melanoma were selected. According to the latest version of the American Joint Committee on Cancer staging system, the disease in the study patients was initially classified as either stage I or II. The other primary tumor characteristics were mean Breslow depth=2.87 mm and median=2 mm; range 1-12.0 mm and Clarks levels III-V. None of the study patients had clinical or radiological evidence of regional lymph node metastatic disease. At least one sentinel node was identified in all patients. Preoperative LS detected a total of 111 sentinel lymph nodes (average 2.13 sentinel lymph node per patient) and demonstrated a single nodal draining basin in 38 (73%) patients and multiple (2-3 draining basins) in the remaining 14 (27%) patients. Fourteen out of the 52 patients (27%) had at least one involved sentinel node. Positron emission tomography was true positive in two patients with a sentinel node greater than 1 cm and false positive in two other patients. In this study, the detection of sentinel lymph node by LS and gamma probe had a sensitivity of 100%. In contrast, 18F-FDG-PET imaging demonstrated very low sensitivity (14.3%; 95% CI, 2.5 to 44%) and positive predictive value (50%; 95% CI, 9 to 90%) for localizing the subclinical nodal metastases. The specificity, net present value, and diagnostic accuracy were 94.7, 75, and 73%, respectively. Preoperative fluorodeoxyglucose-positron emission tomography/computed tomography imaging is not able to substitute LS/sentinel lymph node biopsy in patients at stage I or II.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Krueger, Jens; Micikevicius, Paulius; Williams, Samuel

Reverse Time Migration (RTM) is one of the main approaches in the seismic processing industry for imaging the subsurface structure of the Earth. While RTM provides qualitative advantages over its predecessors, it has a high computational cost warranting implementation on HPC architectures. We focus on three progressively more complex kernels extracted from RTM: for isotropic (ISO), vertical transverse isotropic (VTI) and tilted transverse isotropic (TTI) media. In this work, we examine performance optimization of forward wave modeling, which describes the computational kernels used in RTM, on emerging multi- and manycore processors and introduce a novel common subexpression elimination optimization formore » TTI kernels. We compare attained performance and energy efficiency in both the single-node and distributed memory environments in order to satisfy industry’s demands for fidelity, performance, and energy efficiency. Moreover, we discuss the interplay between architecture (chip and system) and optimizations (both on-node computation) highlighting the importance of NUMA-aware approaches to MPI communication. Ultimately, our results show we can improve CPU energy efficiency by more than 10× on Magny Cours nodes while acceleration via multiple GPUs can surpass the energy-efficient Intel Sandy Bridge by as much as 3.6×.« less
Determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation

DOEpatents

Blocksome, Michael A [Rochester, MN

2011-12-20

Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.
Reducing power consumption while performing collective operations on a plurality of compute nodes

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2011-10-18

Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation.
Incomplete inhibition of HIV infection results in more HIV infected lymph node cells by reducing cell death

PubMed Central

Cele, Sandile; Ferreira, Isabella Markham; Young, Andrew C; Karim, Farina; Madansein, Rajhmun; Dullabh, Kaylesh J; Chen, Chih-Yuan; Buckels, Noel J; Ganga, Yashica; Khan, Khadija; Boulle, Mikael; Lustig, Gila; Neher, Richard A

2018-01-01

HIV has been reported to be cytotoxic in vitro and in lymph node infection models. Using a computational approach, we found that partial inhibition of transmissions of multiple virions per cell could lead to increased numbers of live infected cells. If the number of viral DNA copies remains above one after inhibition, then eliminating the surplus viral copies reduces cell death. Using a cell line, we observed increased numbers of live infected cells when infection was partially inhibited with the antiretroviral efavirenz or neutralizing antibody. We then used efavirenz at concentrations reported in lymph nodes to inhibit lymph node infection by partially resistant HIV mutants. We observed more live infected lymph node cells, but with fewer HIV DNA copies per cell, relative to no drug. Hence, counterintuitively, limited attenuation of HIV transmission per cell may increase live infected cell numbers in environments where the force of infection is high. PMID:29555018
I/O routing in a multidimensional torus network

DOEpatents

Chen, Dong; Eisley, Noel A.; Heidelberger, Philip

2017-02-07

A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
I/O routing in a multidimensional torus network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Dong; Eisley, Noel A.; Heidelberger, Philip

A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destinationmore » address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.« less
Reducing power consumption during execution of an application on a plurality of compute nodes

DOEpatents

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-06-05

Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: executing, by each compute node, an application, the application including power consumption directives corresponding to one or more portions of the application; identifying, by each compute node, the power consumption directives included within the application during execution of the portions of the application corresponding to those identified power consumption directives; and reducing power, by each compute node, to one or more components of that compute node according to the identified power consumption directives during execution of the portions of the application corresponding to those identified power consumption directives.
Frequency of an accessory popliteal efferent lymphatic pathway in dogs.

PubMed

Mayer, Monique N; Sweet, Katherine A; Patsikas, Michael N; Sukut, Sally L; Waldner, Cheryl L

2018-05-01

Staging and therapeutic planning for dogs with malignant disease in the popliteal lymph node are based on the expected patterns of lymphatic drainage from the lymph node. The medial iliac lymph nodes are known to receive efferent lymph from the popliteal lymph node; however, an accessory popliteal efferent pathway with direct connection to the sacral lymph nodes has also been less frequently reported. The primary objective of this prospective, anatomic study was to describe the frequency of various patterns of lymphatic drainage of the popliteal lymph node. With informed client consent, 50 adult dogs with no known disease of the lymphatic system underwent computed tomographic lymphography after ultrasound-guided, percutaneous injection of 350 mg/ml iohexol into a popliteal lymph node. In all 50 dogs, the popliteal lymph node drained directly to the ipsilateral medial iliac lymph node through multiple lymphatic vessels that coursed along the medial thigh. In 26% (13/50) of dogs, efferent vessels also drained from the popliteal lymph node directly to the internal iliac and/or sacral lymph nodes, coursing laterally through the gluteal region and passing over the dorsal aspect of the pelvis. Lymphatic connections between the right and left medial iliac and right and left internal iliac lymph nodes were found. Based on our findings, the internal iliac and sacral lymph nodes should be considered when staging or planning therapy for dogs with malignant disease in the popliteal lymph node. © 2018 American College of Veterinary Radiology.
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less
Reducing power consumption during execution of an application on a plurality of compute nodes

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.

2013-09-10

Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory.
Collaborative localization in wireless sensor networks via pattern recognition in radio irregularity using omnidirectional antennas.

PubMed

Jiang, Joe-Air; Chuang, Cheng-Long; Lin, Tzu-Shiang; Chen, Chia-Pang; Hung, Chih-Hung; Wang, Jiing-Yi; Liu, Chang-Wang; Lai, Tzu-Yun

2010-01-01

In recent years, various received signal strength (RSS)-based localization estimation approaches for wireless sensor networks (WSNs) have been proposed. RSS-based localization is regarded as a low-cost solution for many location-aware applications in WSNs. In previous studies, the radiation patterns of all sensor nodes are assumed to be spherical, which is an oversimplification of the radio propagation model in practical applications. In this study, we present an RSS-based cooperative localization method that estimates unknown coordinates of sensor nodes in a network. Arrangement of two external low-cost omnidirectional dipole antennas is developed by using the distance-power gradient model. A modified robust regression is also proposed to determine the relative azimuth and distance between a sensor node and a fixed reference node. In addition, a cooperative localization scheme that incorporates estimations from multiple fixed reference nodes is presented to improve the accuracy of the localization. The proposed method is tested via computer-based analysis and field test. Experimental results demonstrate that the proposed low-cost method is a useful solution for localizing sensor nodes in unknown or changing environments.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-07-09

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

DOEpatents

Faraj, Ahmad

2013-02-12

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
Determining collective barrier operation skew in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faraj, Daniel A.

2015-11-24

Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Determining collective barrier operation skew in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faraj, Daniel A.

Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
A DAG Scheduling Scheme on Heterogeneous Computing Systems Using Tuple-Based Chemical Reaction Optimization

PubMed Central

Jiang, Yuyi; Shao, Zhiqing; Guo, Yi

2014-01-01

A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems. PMID:25143977
A DAG scheduling scheme on heterogeneous computing systems using tuple-based chemical reaction optimization.

PubMed

Jiang, Yuyi; Shao, Zhiqing; Guo, Yi

2014-01-01

A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Protocol for multiple node network

NASA Technical Reports Server (NTRS)

Kirkham, Harold (Inventor)

1995-01-01

The invention is a multiple interconnected network of intelligent message-repeating remote nodes which employs an antibody recognition message termination process performed by all remote nodes and a remote node polling process performed by other nodes which are master units controlling remote nodes in respective zones of the network assigned to respective master nodes. Each remote node repeats only those messages originated in the local zone, to provide isolation among the master nodes.

Protocol for multiple node network

NASA Technical Reports Server (NTRS)

Kirkham, Harold (Inventor)

1994-01-01

The invention is a multiple interconnected network of intelligent message-repeating remote nodes which employs an antibody recognition message termination process performed by all remote nodes and a remote node polling process performed by other nodes which are master units controlling remote nodes in respective zones of the network assigned to respective master nodes. Each remote node repeats only those messages originated in the local zone, to provide isolation among the master nodes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joinedmore » the single local barrier.« less
Thread selection according to power characteristics during context switching on compute nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J.; Blocksome, Michael A.; Randles, Amanda E.

Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less
Thread selection according to predefined power characteristics during context switching on compute nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

None, None

Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switchmore » if the criteria for a thread context switch are satisfied, including executing the next thread of execution.« less
Radar Data Processing Using a Distributed Computational System

DTIC Science & Technology

1992-06-01

objects to processors must reduce Toc (N) (i.e., the time to compute on 85 N nodes) [Ref. 28]. Time spent communicating can represent a degradation of...de Sistemas e Computaq&o, s/ data. [9] Vilhena R. "IntroduqAo aos Algoritmos para Processamento de Marcaq6es e DistAncias", Escola Naval - Notas de...Aula - Automaq&o de Sistemas Navais, s/ data. (101 Averbuch A., Itzikcwitz S., and Kapon T. "Parallel Implementation of Multiple Model Tracking
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

NASA Astrophysics Data System (ADS)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
High speed polling protocol for multiple node network

NASA Technical Reports Server (NTRS)

Kirkham, Harold (Inventor)

1995-01-01

The invention is a multiple interconnected network of intelligent message-repeating remote nodes which employs a remote node polling process performed by a master node by transmitting a polling message generically addressed to all remote nodes associated with the master node. Each remote node responds upon receipt of the generically addressed polling message by transmitting a poll-answering informational message and by relaying the polling message to other adjacent remote nodes.
A Temporal Credential-Based Mutual Authentication with Multiple-Password Scheme for Wireless Sensor Networks

PubMed Central

Zhang, Ruisheng; Liu, Qidong

2017-01-01

Wireless sensor networks (WSNs), which consist of a large number of sensor nodes, have become among the most important technologies in numerous fields, such as environmental monitoring, military surveillance, control systems in nuclear reactors, vehicle safety systems, and medical monitoring. The most serious drawback for the widespread application of WSNs is the lack of security. Given the resource limitation of WSNs, traditional security schemes are unsuitable. Approaches toward withstanding related attacks with small overhead have thus recently been studied by many researchers. Numerous studies have focused on the authentication scheme for WSNs, but most of these works cannot achieve the security performance and overhead perfectly. Nam et al. proposed a two-factor authentication scheme with lightweight sensor computation for WSNs. In this paper, we review this scheme, emphasize its drawbacks, and propose a temporal credential-based mutual authentication with a multiple-password scheme for WSNs. Our scheme uses multiple passwords to achieve three-factor security performance and generate a session key between user and sensor nodes. The security analysis phase shows that our scheme can withstand related attacks, including a lost password threat, and the comparison phase shows that our scheme involves a relatively small overhead. In the comparison of the overhead phase, the result indicates that more than 95% of the overhead is composed of communication and not computation overhead. Therefore, the result motivates us to pay further attention to communication overhead than computation overhead in future research. PMID:28135288
A Temporal Credential-Based Mutual Authentication with Multiple-Password Scheme for Wireless Sensor Networks.

PubMed

Liu, Xin; Zhang, Ruisheng; Liu, Qidong

2017-01-01

Wireless sensor networks (WSNs), which consist of a large number of sensor nodes, have become among the most important technologies in numerous fields, such as environmental monitoring, military surveillance, control systems in nuclear reactors, vehicle safety systems, and medical monitoring. The most serious drawback for the widespread application of WSNs is the lack of security. Given the resource limitation of WSNs, traditional security schemes are unsuitable. Approaches toward withstanding related attacks with small overhead have thus recently been studied by many researchers. Numerous studies have focused on the authentication scheme for WSNs, but most of these works cannot achieve the security performance and overhead perfectly. Nam et al. proposed a two-factor authentication scheme with lightweight sensor computation for WSNs. In this paper, we review this scheme, emphasize its drawbacks, and propose a temporal credential-based mutual authentication with a multiple-password scheme for WSNs. Our scheme uses multiple passwords to achieve three-factor security performance and generate a session key between user and sensor nodes. The security analysis phase shows that our scheme can withstand related attacks, including a lost password threat, and the comparison phase shows that our scheme involves a relatively small overhead. In the comparison of the overhead phase, the result indicates that more than 95% of the overhead is composed of communication and not computation overhead. Therefore, the result motivates us to pay further attention to communication overhead than computation overhead in future research.
STAMPS: Software Tool for Automated MRI Post-processing on a supercomputer.

PubMed

Bigler, Don C; Aksu, Yaman; Miller, David J; Yang, Qing X

2009-08-01

This paper describes a Software Tool for Automated MRI Post-processing (STAMP) of multiple types of brain MRIs on a workstation and for parallel processing on a supercomputer (STAMPS). This software tool enables the automation of nonlinear registration for a large image set and for multiple MR image types. The tool uses standard brain MRI post-processing tools (such as SPM, FSL, and HAMMER) for multiple MR image types in a pipeline fashion. It also contains novel MRI post-processing features. The STAMP image outputs can be used to perform brain analysis using Statistical Parametric Mapping (SPM) or single-/multi-image modality brain analysis using Support Vector Machines (SVMs). Since STAMPS is PBS-based, the supercomputer may be a multi-node computer cluster or one of the latest multi-core computers.
An Improved Co-evolutionary Particle Swarm Optimization for Wireless Sensor Networks with Dynamic Deployment

PubMed Central

Wang, Xue; Wang, Sheng; Ma, Jun-Jie

2007-01-01

The effectiveness of wireless sensor networks (WSNs) depends on the coverage and target detection probability provided by dynamic deployment, which is usually supported by the virtual force (VF) algorithm. However, in the VF algorithm, the virtual force exerted by stationary sensor nodes will hinder the movement of mobile sensor nodes. Particle swarm optimization (PSO) is introduced as another dynamic deployment algorithm, but in this case the computation time required is the big bottleneck. This paper proposes a dynamic deployment algorithm which is named “virtual force directed co-evolutionary particle swarm optimization” (VFCPSO), since this algorithm combines the co-evolutionary particle swarm optimization (CPSO) with the VF algorithm, whereby the CPSO uses multiple swarms to optimize different components of the solution vectors for dynamic deployment cooperatively and the velocity of each particle is updated according to not only the historical local and global optimal solutions, but also the virtual forces of sensor nodes. Simulation results demonstrate that the proposed VFCPSO is competent for dynamic deployment in WSNs and has better performance with respect to computation time and effectiveness than the VF, PSO and VFPSO algorithms.
Living-Donor Liver Transplant for Fibrolamellar Hepatocellular Carcinoma With Hilar Lymph Node Metastasis: A Case Report.

PubMed

Ince, Volkan; Isik, Burak; Ozdemir, Fatih; Ozgor, Dincer; Ara, Cengiz; Yilmaz, Sezai

2018-04-09

Fibrolamellar hepatocellular carcinoma is a rare primary malignant liver neoplasm. Benefits from liver transplant for patients with fibrolamellar hepatocellular carcinoma have not yet been reported. Here, we report a 19-year-old female patient who presented with abdominal pain. A computed tomography scan revealed bilobar and multiple solid lesions with the largest measuring 15 cm in diameter on the right lobe of her liver. Her blood alpha-fetoprotein level and viral hepatitis markers were normal. A fine-needle biopsy of the largest lesion detected fibrolamellar heptocellular carcinoma. Because no distant metastasis was evident and the carcinoma was unresectable, a right lobe living-donor liver transplant with hilar lymph node dissection was performed. A pathology report revealed poorly differentiated fibrolamellar hepatocellular carcinoma, and further testing indicated microvascular invasion and hilar lymph node metastasis. The largest tumor measured 12 cm. She was discharged on postoperative day 14. During postoperative month 22, multiple vertebral metastases were detected, and she died with diffuse metastasis during postoperative month 26. Our patient, with poor prognostic criteria such as hilar lymph node metastasis, microvascular invasion, and poor differentiation, had 22 months of tumor-free survival and 26 months of overall survival after having undergone living-donor liver transplant.
Budget-based power consumption for application execution on a plurality of compute nodes

DOEpatents

Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E

2013-02-05

Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications.
Budget-based power consumption for application execution on a plurality of compute nodes

DOEpatents

Archer, Charles J; Inglett, Todd A; Ratterman, Joseph D

2012-10-23

Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications.
Adaptive Connectivity Restoration from Node Failure(s) in Wireless Sensor Networks

PubMed Central

Wang, Huaiyuan; Ding, Xu; Huang, Cheng; Wu, Xiaobei

2016-01-01

Recently, there is a growing interest in the applications of wireless sensor networks (WSNs). A set of sensor nodes is deployed in order to collectively survey an area of interest and/or perform specific surveillance tasks in some of the applications, such as battlefield reconnaissance. Due to the harsh deployment environments and limited energy supply, nodes may fail, which impacts the connectivity of the whole network. Since a single node failure (cut-vertex) will destroy the connectivity and divide the network into disjoint blocks, most of the existing studies focus on the problem of single node failure. However, the failure of multiple nodes would be a disaster to the whole network and must be repaired effectively. Only few studies are proposed to handle the problem of multiple cut-vertex failures, which is a special case of multiple node failures. Therefore, this paper proposes a comprehensive solution to address the problems of node failure (single and multiple). Collaborative Single Node Failure Restoration algorithm (CSFR) is presented to solve the problem of single node failure only with cooperative communication, but CSFR-M, which is the extension of CSFR, handles the single node failure problem more effectively with node motion. Moreover, Collaborative Connectivity Restoration Algorithm (CCRA) is proposed on the basis of cooperative communication and node maneuverability to restore network connectivity after multiple nodes fail. CSFR-M and CCRA are reactive methods that initiate the connectivity restoration after detecting the node failure(s). In order to further minimize the energy dissipation, CCRA opts to simplify the recovery process by gridding. Moreover, the distance that an individual node needs to travel during recovery is reduced by choosing the nearest suitable candidates. Finally, extensive simulations validate the performance of CSFR, CSFR-M and CCRA. PMID:27690030
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
On the Feasibility of Wireless Multimedia Sensor Networks over IEEE 802.15.5 Mesh Topologies

PubMed Central

Garcia-Sanchez, Antonio-Javier; Losilla, Fernando; Rodenas-Herraiz, David; Cruz-Martinez, Felipe; Garcia-Sanchez, Felipe

2016-01-01

Wireless Multimedia Sensor Networks (WMSNs) are a special type of Wireless Sensor Network (WSN) where large amounts of multimedia data are transmitted over networks composed of low power devices. Hierarchical routing protocols typically used in WSNs for multi-path communication tend to overload nodes located within radio communication range of the data collection unit or data sink. The battery life of these nodes is therefore reduced considerably, requiring frequent battery replacement work to extend the operational life of the WSN system. In a wireless sensor network with mesh topology, any node may act as a forwarder node, thereby enabling multiple routing paths toward any other node or collection unit. In addition, mesh topologies have proven advantages, such as data transmission reliability, network robustness against node failures, and potential reduction in energy consumption. This work studies the feasibility of implementing WMSNs in mesh topologies and their limitations by means of exhaustive computer simulation experiments. To this end, a module developed for the Synchronous Energy Saving (SES) mode of the IEEE 802.15.5 mesh standard has been integrated with multimedia tools to thoroughly test video sequences encoded using H.264 in mesh networks. PMID:27164106
On the Feasibility of Wireless Multimedia Sensor Networks over IEEE 802.15.5 Mesh Topologies.

PubMed

Garcia-Sanchez, Antonio-Javier; Losilla, Fernando; Rodenas-Herraiz, David; Cruz-Martinez, Felipe; Garcia-Sanchez, Felipe

2016-05-05

Wireless Multimedia Sensor Networks (WMSNs) are a special type of Wireless Sensor Network (WSN) where large amounts of multimedia data are transmitted over networks composed of low power devices. Hierarchical routing protocols typically used in WSNs for multi-path communication tend to overload nodes located within radio communication range of the data collection unit or data sink. The battery life of these nodes is therefore reduced considerably, requiring frequent battery replacement work to extend the operational life of the WSN system. In a wireless sensor network with mesh topology, any node may act as a forwarder node, thereby enabling multiple routing paths toward any other node or collection unit. In addition, mesh topologies have proven advantages, such as data transmission reliability, network robustness against node failures, and potential reduction in energy consumption. This work studies the feasibility of implementing WMSNs in mesh topologies and their limitations by means of exhaustive computer simulation experiments. To this end, a module developed for the Synchronous Energy Saving (SES) mode of the IEEE 802.15.5 mesh standard has been integrated with multimedia tools to thoroughly test video sequences encoded using H.264 in mesh networks.
Chaining direct memory access data transfer operations for compute nodes in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.

2010-09-28

Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.
Parallel compression of data chunks of a shared data object using a log-structured file system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-10-25

Techniques are provided for parallel compression of data chunks being written to a shared object. A client executing on a compute node or a burst buffer node in a parallel computing system stores a data chunk generated by the parallel computing system to a shared data object on a storage node by compressing the data chunk; and providing the data compressed data chunk to the storage node that stores the shared object. The client and storage node may employ Log-Structured File techniques. The compressed data chunk can be de-compressed by the client when the data chunk is read. A storagemore » node stores a data chunk as part of a shared object by receiving a compressed version of the data chunk from a compute node; and storing the compressed version of the data chunk to the shared data object on the storage node.« less

A feasibility study on porting the community land model onto accelerators using OpenACC

DOE PAGES

Wang, Dali; Wu, Wei; Winkler, Frank; ...

2014-01-01

As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less
Developing Subdomain Allocation Algorithms Based on Spatial and Communicational Constraints to Accelerate Dust Storm Simulation

PubMed Central

Gui, Zhipeng; Yu, Manzhu; Yang, Chaowei; Jiang, Yunfeng; Chen, Songqing; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Hassan, Mohammed Anowarul; Jin, Baoxuan

2016-01-01

Dust storm has serious disastrous impacts on environment, human health, and assets. The developments and applications of dust storm models have contributed significantly to better understand and predict the distribution, intensity and structure of dust storms. However, dust storm simulation is a data and computing intensive process. To improve the computing performance, high performance computing has been widely adopted by dividing the entire study area into multiple subdomains and allocating each subdomain on different computing nodes in a parallel fashion. Inappropriate allocation may introduce imbalanced task loads and unnecessary communications among computing nodes. Therefore, allocation is a key factor that may impact the efficiency of parallel process. An allocation algorithm is expected to consider the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire simulation. This research introduces three algorithms to optimize the allocation by considering the spatial and communicational constraints: 1) an Integer Linear Programming (ILP) based algorithm from combinational optimization perspective; 2) a K-Means and Kernighan-Lin combined heuristic algorithm (K&K) integrating geometric and coordinate-free methods by merging local and global partitioning; 3) an automatic seeded region growing based geometric and local partitioning algorithm (ASRG). The performance and effectiveness of the three algorithms are compared based on different factors. Further, we adopt the K&K algorithm as the demonstrated algorithm for the experiment of dust model simulation with the non-hydrostatic mesoscale model (NMM-dust) and compared the performance with the MPI default sequential allocation. The results demonstrate that K&K method significantly improves the simulation performance with better subdomain allocation. This method can also be adopted for other relevant atmospheric and numerical modeling. PMID:27044039
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, R.E.; Gustafson, J.L.; Montry, G.R.

1999-08-10

A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications

DOEpatents

Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

1999-01-01

A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
A connectionist model for diagnostic problem solving

NASA Technical Reports Server (NTRS)

Peng, Yun; Reggia, James A.

1989-01-01

A competition-based connectionist model for solving diagnostic problems is described. The problems considered are computationally difficult in that (1) multiple disorders may occur simultaneously and (2) a global optimum in the space exponential to the total number of possible disorders is sought as a solution. The diagnostic problem is treated as a nonlinear optimization problem, and global optimization criteria are decomposed into local criteria governing node activation updating in the connectionist model. Nodes representing disorders compete with each other to account for each individual manifestation, yet complement each other to account for all manifestations through parallel node interactions. When equilibrium is reached, the network settles into a locally optimal state. Three randomly generated examples of diagnostic problems, each of which has 1024 cases, were tested, and the decomposition plus competition plus resettling approach yielded very high accuracy.
Design & implementation of distributed spatial computing node based on WPS

NASA Astrophysics Data System (ADS)

Liu, Liping; Li, Guoqing; Xie, Jibo

2014-03-01

Currently, the research work of SIG (Spatial Information Grid) technology mostly emphasizes on the spatial data sharing in grid environment, while the importance of spatial computing resources is ignored. In order to implement the sharing and cooperation of spatial computing resources in grid environment, this paper does a systematical research of the key technologies to construct Spatial Computing Node based on the WPS (Web Processing Service) specification by OGC (Open Geospatial Consortium). And a framework of Spatial Computing Node is designed according to the features of spatial computing resources. Finally, a prototype of Spatial Computing Node is implemented and the relevant verification work under the environment is completed.
Method and apparatus for offloading compute resources to a flash co-processing appliance

DOEpatents

Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung

2015-10-13

Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
Lustre Distributed Name Space (DNE) Evaluation at the Oak Ridge Leadership Computing Facility (OLCF)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simmons, James S.; Leverman, Dustin B.; Hanley, Jesse A.

This document describes the Lustre Distributed Name Space (DNE) evaluation carried at the Oak Ridge Leadership Computing Facility (OLCF) between 2014 and 2015. DNE is a development project funded by the OpenSFS, to improve Lustre metadata performance and scalability. The development effort has been split into two parts, the first part (DNE P1) providing support for remote directories over remote Lustre Metadata Server (MDS) nodes and Metadata Target (MDT) devices, while the second phase (DNE P2) addressed split directories over multiple remote MDS nodes and MDT devices. The OLCF have been actively evaluating the performance, reliability, and the functionality ofmore » both DNE phases. For these tests, internal OLCF testbed were used. Results are promising and OLCF is planning on a full DNE deployment by mid-2016 timeframe on production systems.« less
Evaluation of Secure Computation in a Distributed Healthcare Setting.

PubMed

Kimura, Eizen; Hamada, Koki; Kikuchi, Ryo; Chida, Koji; Okamoto, Kazuya; Manabe, Shirou; Kuroda, Tomohiko; Matsumura, Yasushi; Takeda, Toshihiro; Mihara, Naoki

2016-01-01

Issues related to ensuring patient privacy and data ownership in clinical repositories prevent the growth of translational research. Previous studies have used an aggregator agent to obscure clinical repositories from the data user, and to ensure the privacy of output using statistical disclosure control. However, there remain several issues that must be considered. One such issue is that a data breach may occur when multiple nodes conspire. Another is that the agent may eavesdrop on or leak a user's queries and their results. We have implemented a secure computing method so that the data used by each party can be kept confidential even if all of the other parties conspire to crack the data. We deployed our implementation at three geographically distributed nodes connected to a high-speed layer two network. The performance of our method, with respect to processing times, suggests suitability for practical use.
High speed polling protocol for multiple node network with sequential flooding of a polling message and a poll-answering message

NASA Technical Reports Server (NTRS)

Marvit, Maclen (Inventor); Kirkham, Harold (Inventor)

1995-01-01

The invention is a multiple interconnected network of intelligent message-repeating remote nodes which employs a remote node polling process performed by a master node by transmitting a polling message generically addressed to all remote nodes associated with the master node. Each remote node responds upon receipt of the generically addressed polling message by sequentially flooding the network with a poll-answering informational message and with the polling message.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path.

PubMed

Xie, Zhiqiang; Shao, Xia; Xin, Yu

2016-01-01

To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path

PubMed Central

Xie, Zhiqiang; Shao, Xia; Xin, Yu

2016-01-01

To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective. PMID:27490901
Simulation Methods for Design of Networked Power Electronics and Information Systems

DTIC Science & Technology

2014-07-01

Insertion of latency in every branch and at every node permits the system model to be efficiently distributed across many separate computing cores. An... the system . We demonstrated extensibility and generality of the Virtual Test Bed (VTB) framework to support multiple solvers and their associated...Information Systems Objectives The overarching objective of this program is to develop methods for fast
Asynchronous broadcast for ordered delivery between compute nodes in a parallel computing system where packet header space is limited

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, Sameer

Disclosed is a mechanism on receiving processors in a parallel computing system for providing order to data packets received from a broadcast call and to distinguish data packets received at nodes from several incoming asynchronous broadcast messages where header space is limited. In the present invention, processors at lower leafs of a tree do not need to obtain a broadcast message by directly accessing the data in a root processor's buffer. Instead, each subsequent intermediate node's rank id information is squeezed into the software header of packet headers. In turn, the entire broadcast message is not transferred from the rootmore » processor to each processor in a communicator but instead is replicated on several intermediate nodes which then replicated the message to nodes in lower leafs. Hence, the intermediate compute nodes become "virtual root compute nodes" for the purpose of replicating the broadcast message to lower levels of a tree.« less
Self-pacing direct memory access data transfer operations for compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2015-02-17

Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.
Adaptive multi-node multiple input and multiple output (MIMO) transmission for mobile wireless multimedia sensor networks.

PubMed

Cho, Sunghyun; Choi, Ji-Woong; You, Cheolwoo

2013-10-02

Mobile wireless multimedia sensor networks (WMSNs), which consist of mobile sink or sensor nodes and use rich sensing information, require much faster and more reliable wireless links than static wireless sensor networks (WSNs). This paper proposes an adaptive multi-node (MN) multiple input and multiple output (MIMO) transmission to improve the transmission reliability and capacity of mobile sink nodes when they experience spatial correlation. Unlike conventional single-node (SN) MIMO transmission, the proposed scheme considers the use of transmission antennas from more than two sensor nodes. To find an optimal antenna set and a MIMO transmission scheme, a MN MIMO channel model is introduced first, followed by derivation of closed-form ergodic capacity expressions with different MIMO transmission schemes, such as space-time transmit diversity coding and spatial multiplexing. The capacity varies according to the antenna correlation and the path gain from multiple sensor nodes. Based on these statistical results, we propose an adaptive MIMO mode and antenna set switching algorithm that maximizes the ergodic capacity of mobile sink nodes. The ergodic capacity of the proposed scheme is compared with conventional SN MIMO schemes, where the gain increases as the antenna correlation and path gain ratio increase.
Adaptive Multi-Node Multiple Input and Multiple Output (MIMO) Transmission for Mobile Wireless Multimedia Sensor Networks

PubMed Central

Cho, Sunghyun; Choi, Ji-Woong; You, Cheolwoo

2013-01-01

Mobile wireless multimedia sensor networks (WMSNs), which consist of mobile sink or sensor nodes and use rich sensing information, require much faster and more reliable wireless links than static wireless sensor networks (WSNs). This paper proposes an adaptive multi-node (MN) multiple input and multiple output (MIMO) transmission to improve the transmission reliability and capacity of mobile sink nodes when they experience spatial correlation. Unlike conventional single-node (SN) MIMO transmission, the proposed scheme considers the use of transmission antennas from more than two sensor nodes. To find an optimal antenna set and a MIMO transmission scheme, a MN MIMO channel model is introduced first, followed by derivation of closed-form ergodic capacity expressions with different MIMO transmission schemes, such as space-time transmit diversity coding and spatial multiplexing. The capacity varies according to the antenna correlation and the path gain from multiple sensor nodes. Based on these statistical results, we propose an adaptive MIMO mode and antenna set switching algorithm that maximizes the ergodic capacity of mobile sink nodes. The ergodic capacity of the proposed scheme is compared with conventional SN MIMO schemes, where the gain increases as the antenna correlation and path gain ratio increase. PMID:24152920
Development of climate data storage and processing model

NASA Astrophysics Data System (ADS)

Okladnikov, I. G.; Gordov, E. P.; Titov, A. G.

2016-11-01

We present a storage and processing model for climate datasets elaborated in the framework of a virtual research environment (VRE) for climate and environmental monitoring and analysis of the impact of climate change on the socio-economic processes on local and regional scales. The model is based on a «shared nothings» distributed computing architecture and assumes using a computing network where each computing node is independent and selfsufficient. Each node holds a dedicated software for the processing and visualization of geospatial data providing programming interfaces to communicate with the other nodes. The nodes are interconnected by a local network or the Internet and exchange data and control instructions via SSH connections and web services. Geospatial data is represented by collections of netCDF files stored in a hierarchy of directories in the framework of a file system. To speed up data reading and processing, three approaches are proposed: a precalculation of intermediate products, a distribution of data across multiple storage systems (with or without redundancy), and caching and reuse of the previously obtained products. For a fast search and retrieval of the required data, according to the data storage and processing model, a metadata database is developed. It contains descriptions of the space-time features of the datasets available for processing, their locations, as well as descriptions and run options of the software components for data analysis and visualization. The model and the metadata database together will provide a reliable technological basis for development of a high- performance virtual research environment for climatic and environmental monitoring.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor

NASA Technical Reports Server (NTRS)

Kotz, David; Cai, Ting

1995-01-01

As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Resource Constrained Planning of Multiple Projects with Separable Activities

NASA Astrophysics Data System (ADS)

Fujii, Susumu; Morita, Hiroshi; Kanawa, Takuya

In this study we consider a resource constrained planning problem of multiple projects with separable activities. This problem provides a plan to process the activities considering a resource availability with time window. We propose a solution algorithm based on the branch and bound method to obtain the optimal solution minimizing the completion time of all projects. We develop three methods for improvement of computational efficiency, that is, to obtain initial solution with minimum slack time rule, to estimate lower bound considering both time and resource constraints and to introduce an equivalence relation for bounding operation. The effectiveness of the proposed methods is demonstrated by numerical examples. Especially as the number of planning projects increases, the average computational time and the number of searched nodes are reduced.

Distributed Multihoming Routing Method by Crossing Control MIPv6 with SCTP

NASA Astrophysics Data System (ADS)

Shi, Hongbo; Hamagami, Tomoki

There are various wireless communication technologies, such as 3G, WiFi, used widely in the world. Recently, not only the laptop but also the smart phones can be equipped with multiple wireless devices. The communication terminals which are implemented with multiple interfaces are usually called multi-homed nodes. Meanwhile, a multi-homed node with multiple interfaces can also be regarded as multiple single-homed nodes. For example, when a person who is using smart phone and laptop to connect to the Internet concurrently, we may regard the person as a multi-homed node in the Internet. This paper proposes a new routing method, Multi-homed Mobile Cross-layer Control to handle multi-homed mobile nodes. Our suggestion can provide a distributed end-to-end routing method for handling the communications among multi-homed nodes at the fundamental network layer.
A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics.

PubMed

Deng, Wan-Yu; Bai, Zuo; Huang, Guang-Bin; Zheng, Qing-Hua

2016-05-01

Big dimensional data is a growing trend that is emerging in many real world contexts, extending from web mining, gene expression analysis, protein-protein interaction to high-frequency financial data. Nowadays, there is a growing consensus that the increasing dimensionality poses impeding effects on the performances of classifiers, which is termed as the "peaking phenomenon" in the field of machine intelligence. To address the issue, dimensionality reduction is commonly employed as a preprocessing step on the Big dimensional data before building the classifiers. In this paper, we propose an Extreme Learning Machine (ELM) approach for large-scale data analytic. In contrast to existing approaches, we embed hidden nodes that are designed using singular value decomposition (SVD) into the classical ELM. These SVD nodes in the hidden layer are shown to capture the underlying characteristics of the Big dimensional data well, exhibiting excellent generalization performances. The drawback of using SVD on the entire dataset, however, is the high computational complexity involved. To address this, a fast divide and conquer approximation scheme is introduced to maintain computational tractability on high volume data. The resultant algorithm proposed is labeled here as Fast Singular Value Decomposition-Hidden-nodes based Extreme Learning Machine or FSVD-H-ELM in short. In FSVD-H-ELM, instead of identifying the SVD hidden nodes directly from the entire dataset, SVD hidden nodes are derived from multiple random subsets of data sampled from the original dataset. Comprehensive experiments and comparisons are conducted to assess the FSVD-H-ELM against other state-of-the-art algorithms. The results obtained demonstrated the superior generalization performance and efficiency of the FSVD-H-ELM. Copyright © 2016 Elsevier Ltd. All rights reserved.
Computer hardware fault administration

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-09-14

Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.
Designing allostery-inspired response in mechanical networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rocks, Jason W.; Pashine, Nidhi; Bischofberger, Irmgard

Recent advances in designing metamaterials have demonstrated that global mechanical properties of disordered spring networks can be tuned by selectively modifying only a small subset of bonds. Here, using a computationally efficient approach, we extend this idea to tune more general properties of networks. With nearly complete success, we are then able to produce a strain between any two target nodes in a network in response to an applied source strain on any other pair of nodes by removing only ~1% of the bonds. We are also able to control multiple pairs of target nodes, each with a different individualmore » response, from a single source, and to tune multiple independent source/target responses simultaneously into a network. We have fabricated physical networks in macroscopic 2D and 3D systems that exhibit these responses. This work is inspired by the long-range coupled conformational changes that constitute allosteric function in proteins. The fact that allostery is a common means for regulation in biological molecules suggests that it is a relatively easy property to develop through evolution. In analogy, our results show that long-range coupled mechanical responses are similarly easy to achieve in disordered networks.« less
Designing allostery-inspired response in mechanical networks

DOE PAGES

Rocks, Jason W.; Pashine, Nidhi; Bischofberger, Irmgard; ...

2017-02-21

Recent advances in designing metamaterials have demonstrated that global mechanical properties of disordered spring networks can be tuned by selectively modifying only a small subset of bonds. Here, using a computationally efficient approach, we extend this idea to tune more general properties of networks. With nearly complete success, we are then able to produce a strain between any two target nodes in a network in response to an applied source strain on any other pair of nodes by removing only ~1% of the bonds. We are also able to control multiple pairs of target nodes, each with a different individualmore » response, from a single source, and to tune multiple independent source/target responses simultaneously into a network. We have fabricated physical networks in macroscopic 2D and 3D systems that exhibit these responses. This work is inspired by the long-range coupled conformational changes that constitute allosteric function in proteins. The fact that allostery is a common means for regulation in biological molecules suggests that it is a relatively easy property to develop through evolution. In analogy, our results show that long-range coupled mechanical responses are similarly easy to achieve in disordered networks.« less
Designing allostery-inspired response in mechanical networks

PubMed Central

Rocks, Jason W.; Pashine, Nidhi; Bischofberger, Irmgard; Goodrich, Carl P.; Liu, Andrea J.; Nagel, Sidney R.

2017-01-01

Recent advances in designing metamaterials have demonstrated that global mechanical properties of disordered spring networks can be tuned by selectively modifying only a small subset of bonds. Here, using a computationally efficient approach, we extend this idea to tune more general properties of networks. With nearly complete success, we are able to produce a strain between any two target nodes in a network in response to an applied source strain on any other pair of nodes by removing only ∼1% of the bonds. We are also able to control multiple pairs of target nodes, each with a different individual response, from a single source, and to tune multiple independent source/target responses simultaneously into a network. We have fabricated physical networks in macroscopic 2D and 3D systems that exhibit these responses. This work is inspired by the long-range coupled conformational changes that constitute allosteric function in proteins. The fact that allostery is a common means for regulation in biological molecules suggests that it is a relatively easy property to develop through evolution. In analogy, our results show that long-range coupled mechanical responses are similarly easy to achieve in disordered networks. PMID:28223534
Designing allostery-inspired response in mechanical networks.

PubMed

Rocks, Jason W; Pashine, Nidhi; Bischofberger, Irmgard; Goodrich, Carl P; Liu, Andrea J; Nagel, Sidney R

2017-03-07

Recent advances in designing metamaterials have demonstrated that global mechanical properties of disordered spring networks can be tuned by selectively modifying only a small subset of bonds. Here, using a computationally efficient approach, we extend this idea to tune more general properties of networks. With nearly complete success, we are able to produce a strain between any two target nodes in a network in response to an applied source strain on any other pair of nodes by removing only ∼1% of the bonds. We are also able to control multiple pairs of target nodes, each with a different individual response, from a single source, and to tune multiple independent source/target responses simultaneously into a network. We have fabricated physical networks in macroscopic 2D and 3D systems that exhibit these responses. This work is inspired by the long-range coupled conformational changes that constitute allosteric function in proteins. The fact that allostery is a common means for regulation in biological molecules suggests that it is a relatively easy property to develop through evolution. In analogy, our results show that long-range coupled mechanical responses are similarly easy to achieve in disordered networks.
Identifying Node Role in Social Network Based on Multiple Indicators

PubMed Central

Huang, Shaobin; Lv, Tianyang; Zhang, Xizhe; Yang, Yange; Zheng, Weimin; Wen, Chao

2014-01-01

It is a classic topic of social network analysis to evaluate the importance of nodes and identify the node that takes on the role of core or bridge in a network. Because a single indicator is not sufficient to analyze multiple characteristics of a node, it is a natural solution to apply multiple indicators that should be selected carefully. An intuitive idea is to select some indicators with weak correlations to efficiently assess different characteristics of a node. However, this paper shows that it is much better to select the indicators with strong correlations. Because indicator correlation is based on the statistical analysis of a large number of nodes, the particularity of an important node will be outlined if its indicator relationship doesn't comply with the statistical correlation. Therefore, the paper selects the multiple indicators including degree, ego-betweenness centrality and eigenvector centrality to evaluate the importance and the role of a node. The importance of a node is equal to the normalized sum of its three indicators. A candidate for core or bridge is selected from the great degree nodes or the nodes with great ego-betweenness centrality respectively. Then, the role of a candidate is determined according to the difference between its indicators' relationship with the statistical correlation of the overall network. Based on 18 real networks and 3 kinds of model networks, the experimental results show that the proposed methods perform quite well in evaluating the importance of nodes and in identifying the node role. PMID:25089823
Research on elastic resource management for multi-queue under cloud computing environment

NASA Astrophysics Data System (ADS)

CHENG, Zhenjing; LI, Haibo; HUANG, Qiulan; Cheng, Yaodong; CHEN, Gang

2017-10-01

As a new approach to manage computing resource, virtualization technology is more and more widely applied in the high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, using HTCondor as the job queue management system. In a traditional static cluster, a fixed number of virtual machines are pre-allocated to the job queue of different experiments. However this method cannot be well adapted to the volatility of computing resource requirements. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed. This system performs unified management of virtual computing nodes on the basis of job queue in HTCondor based on dual resource thresholds as well as the quota service. A two-stage pool is designed to improve the efficiency of resource pool expansion. This paper will present several use cases of the elastic resource management system in IHEPCloud. The practical run shows virtual computing resource dynamically expanded or shrunk while computing requirements change. Additionally, the CPU utilization ratio of computing resource was significantly increased when compared with traditional resource management. The system also has good performance when there are multiple condor schedulers and multiple job queues.
Implementation of Multiple Host Nodes in Wireless Sensing Node Network System for Landslide Monitoring

NASA Astrophysics Data System (ADS)

Abas, Faizulsalihin bin; Takayama, Shigeru

2015-02-01

This paper proposes multiple host nodes in Wireless Sensing Node Network System (WSNNS) for landslide monitoring. As landslide disasters damage monitoring system easily, one major demand in landslide monitoring is the flexibility and robustness of the system to evaluate the current situation in the monitored area. For various reasons WSNNS can provide an important contribution to reach that aim. In this system, acceleration sensors and GPS are deployed in sensing nodes. Location information by GPS, enable the system to estimate network topology and enable the system to perceive the location in emergency by monitoring the node mode. Acceleration sensors deployment, capacitate this system to detect slow mass movement that can lead to landslide occurrence. Once deployed, sensing nodes self-organize into an autonomous wireless ad hoc network. The measurement parameter data from sensing nodes is transmitted to Host System via host node and "Cloud" System. The implementation of multiple host nodes in Local Sensing Node Network System (LSNNS), improve risk- management of the WSNNS for real-time monitoring of landslide disaster.
SU-E-T-222: Computational Optimization of Monte Carlo Simulation On 4D Treatment Planning Using the Cloud Computing Technology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chow, J

Purpose: This study evaluated the efficiency of 4D lung radiation treatment planning using Monte Carlo simulation on the cloud. The EGSnrc Monte Carlo code was used in dose calculation on the 4D-CT image set. Methods: 4D lung radiation treatment plan was created by the DOSCTP linked to the cloud, based on the Amazon elastic compute cloud platform. Dose calculation was carried out by Monte Carlo simulation on the 4D-CT image set on the cloud, and results were sent to the FFD4D image deformation program for dose reconstruction. The dependence of computing time for treatment plan on the number of computemore » node was optimized with variations of the number of CT image set in the breathing cycle and dose reconstruction time of the FFD4D. Results: It is found that the dependence of computing time on the number of compute node was affected by the diminishing return of the number of node used in Monte Carlo simulation. Moreover, the performance of the 4D treatment planning could be optimized by using smaller than 10 compute nodes on the cloud. The effects of the number of image set and dose reconstruction time on the dependence of computing time on the number of node were not significant, as more than 15 compute nodes were used in Monte Carlo simulations. Conclusion: The issue of long computing time in 4D treatment plan, requiring Monte Carlo dose calculations in all CT image sets in the breathing cycle, can be solved using the cloud computing technology. It is concluded that the optimized number of compute node selected in simulation should be between 5 and 15, as the dependence of computing time on the number of node is significant.« less
Trust index based fault tolerant multiple event localization algorithm for WSNs.

PubMed

Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue

2011-01-01

This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms.
Trust Index Based Fault Tolerant Multiple Event Localization Algorithm for WSNs

PubMed Central

Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue

2011-01-01

This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms. PMID:22163972
Efficient hybrid metrology for focus, CD, and overlay

NASA Astrophysics Data System (ADS)

Tel, W. T.; Segers, B.; Anunciado, R.; Zhang, Y.; Wong, P.; Hasan, T.; Prentice, C.

2017-03-01

In the advent of multiple patterning techniques in semiconductor industry, metrology has progressively become a burden. With multiple patterning techniques such as Litho-Etch-Litho-Etch and Sidewall Assisted Double Patterning, the number of processing step have increased significantly and therefore, so as the amount of metrology steps needed for both control and yield monitoring. The amount of metrology needed is increasing in each and every node as more layers needed multiple patterning steps, and more patterning steps per layer. In addition to this, there is that need for guided defect inspection, which in itself requires substantially denser focus, overlay, and CD metrology as before. Metrology efficiency will therefore be cruicial to the next semiconductor nodes. ASML's emulated wafer concept offers a highly efficient method for hybrid metrology for focus, CD, and overlay. In this concept metrology is combined with scanner's sensor data in order to predict the on-product performance. The principle underlying the method is to isolate and estimate individual root-causes which are then combined to compute the on-product performance. The goal is to use all the information available to avoid ever increasing amounts of metrology.
Iterative pass optimization of sequence data

NASA Technical Reports Server (NTRS)

Wheeler, Ward C.

2003-01-01

The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Effective correlator for RadioAstron project

NASA Astrophysics Data System (ADS)

Sergeev, Sergey

This paper presents the implementation of programme FX-correlator for Very Long Baseline Interferometry, adapted for the project "RadioAstron". Software correlator implemented for heterogeneous computing systems using graphics accelerators. It is shown that for the task interferometry implementation of the graphics hardware has a high efficiency. The host processor of heterogeneous computing system, performs the function of forming the data flow for graphics accelerators, the number of which corresponds to the number of frequency channels. So, for the Radioastron project, such channels is seven. Each accelerator is perform correlation matrix for all bases for a single frequency channel. Initial data is converted to the floating-point format, is correction for the corresponding delay function and computes the entire correlation matrix simultaneously. Calculation of the correlation matrix is performed using the sliding Fourier transform. Thus, thanks to the compliance of a solved problem for architecture graphics accelerators, managed to get a performance for one processor platform Kepler, which corresponds to the performance of this task, the computing cluster platforms Intel on four nodes. This task successfully scaled not only on a large number of graphics accelerators, but also on a large number of nodes with multiple accelerators.
Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

NASA Astrophysics Data System (ADS)

Limosani, Antonio; Boland, Lucien; Coddington, Paul; Crosby, Sean; Huang, Joanna; Sevior, Martin; Wilson, Ross; Zhang, Shunde

2014-06-01

The Australian Government is making a AUD 100 million investment in Compute and Storage for the academic community. The Compute facilities are provided in the form of 30,000 CPU cores located at 8 nodes around Australia in a distributed virtualized Infrastructure as a Service facility based on OpenStack. The storage will eventually consist of over 100 petabytes located at 6 nodes. All will be linked via a 100 Gb/s network. This proceeding describes the development of a fully connected WLCG Tier-2 grid site as well as a general purpose Tier-3 computing cluster based on this architecture. The facility employs an extension to Torque to enable dynamic allocations of virtual machine instances. A base Scientific Linux virtual machine (VM) image is deployed in the OpenStack cloud and automatically configured as required using Puppet. Custom scripts are used to launch multiple VMs, integrate them into the dynamic Torque cluster and to mount remote file systems. We report on our experience in developing this nation-wide ATLAS and Belle II Tier 2 and Tier 3 computing infrastructure using the national Research Cloud and storage facilities.
Computed tomography detection of extracapsular spread of squamous cell carcinoma of the head and neck in metastatic cervical lymph nodes.

PubMed

Carlton, Joshua A; Maxwell, Adam W; Bauer, Lyndsey B; McElroy, Sara M; Layfield, Lester J; Ahsan, Humera; Agarwal, Ajay

2017-06-01

Background and purpose In patients with squamous cell carcinoma of the head and neck (HNSCC), extracapsular spread (ECS) of metastases in cervical lymph nodes affects prognosis and therapy. We assessed the accuracy of intravenous contrast-enhanced computed tomography (CT) and the utility of imaging criteria for preoperative detection of ECS in metastatic cervical lymph nodes in patients with HNSCC. Materials and methods Preoperative intravenous contrast-enhanced neck CT images of 93 patients with histopathological HNSCC metastatic nodes were retrospectively assessed by two neuroradiologists for ECS status and ECS imaging criteria. Radiological assessments were compared with histopathological assessments of neck dissection specimens, and interobserver agreement of ECS status and ECS imaging criteria were measured. Results Sensitivity, specificity, positive predictive value, and accuracy for overall ECS assessment were 57%, 81%, 82% and 67% for observer 1, and 66%, 76%, 80% and 70% for observer 2, respectively. Correlating three or more ECS imaging criteria with histopathological ECS increased specificity and positive predictive value, but decreased sensitivity and accuracy. Interobserver agreement for overall ECS assessment demonstrated a kappa of 0.59. Central necrosis had the highest kappa of 0.74. Conclusion CT has moderate specificity for ECS assessment in HNSCC metastatic cervical nodes. Identifying three or more ECS imaging criteria raises specificity and positive predictive value, therefore preoperative identification of multiple criteria may be clinically useful. Interobserver agreement is moderate for overall ECS assessment, substantial for central necrosis. Other ECS CT criteria had moderate agreement at best and therefore should not be used individually as criteria for detecting ECS by CT.
Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model.

PubMed

Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

2016-10-06

Comparative analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved functional network modules across different species. Such modules typically consist of orthologous proteins with conserved interactions, which can be exploited to computationally predict the modules through network comparison. In this work, we propose a novel probabilistic framework for comparing PPI networks and effectively predicting the correspondence between proteins, represented as network nodes, that belong to conserved functional modules across the given PPI networks. The basic idea is to estimate the steady-state network flow between nodes that belong to different PPI networks based on a Markov random walk model. The random walker is designed to make random moves to adjacent nodes within a PPI network as well as cross-network moves between potential orthologous nodes with high sequence similarity. Based on this Markov random walk model, we estimate the steady-state network flow - or the long-term relative frequency of the transitions that the random walker makes - between nodes in different PPI networks, which can be used as a probabilistic score measuring their potential correspondence. Subsequently, the estimated scores can be used for detecting orthologous proteins in conserved functional modules through network alignment. Through evaluations based on multiple real PPI networks, we demonstrate that the proposed scheme leads to improved alignment results that are biologically more meaningful at reduced computational cost, outperforming the current state-of-the-art algorithms. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/CUFID .
Research on a Banknote Printing Wastewater Monitoring System based on Wireless Sensor Network

NASA Astrophysics Data System (ADS)

Li, B. B.; Yuan, Z. F.

2006-10-01

In this paper, a banknote printing wastewater monitoring system based on WSN is presented in line with the system demands and actual condition of the worksite for a banknote printing factory. In Physical Layer, the network node is a nRF9e5-centric embedded instrument, which can realize the multi-function such as data collecting, status monitoring, wireless data transmission and so on. Limited by the computing capability, memory capability, communicating energy and others factors, it is impossible for the node to get every detail information of the network, so the communication protocol on WSN couldn't be very complicated. The competitive-based MACA (Multiple Access with Collision Avoidance) Protocol is introduced in MAC, which can decide the communication process and working mode of the nodes, avoid the collision of data transmission, hidden and exposed station problem of nodes. On networks layer, the routing protocol in charge of the transmitting path of the data, the networks topology structure is arranged based on address assignation. Accompanied with some redundant nodes, the network performances stabile and expandable. The wastewater monitoring system is a tentative practice of WSN theory in engineering. Now, the system has passed test and proved efficiently.

Energy-aware scheduling of surveillance in wireless multimedia sensor networks.

PubMed

Wang, Xue; Wang, Sheng; Ma, Junjie; Sun, Xinyao

2010-01-01

Wireless sensor networks involve a large number of sensor nodes with limited energy supply, which impacts the behavior of their application. In wireless multimedia sensor networks, sensor nodes are equipped with audio and visual information collection modules. Multimedia contents are ubiquitously retrieved in surveillance applications. To solve the energy problems during target surveillance with wireless multimedia sensor networks, an energy-aware sensor scheduling method is proposed in this paper. Sensor nodes which acquire acoustic signals are deployed randomly in the sensing fields. Target localization is based on the signal energy feature provided by multiple sensor nodes, employing particle swarm optimization (PSO). During the target surveillance procedure, sensor nodes are adaptively grouped in a totally distributed manner. Specially, the target motion information is extracted by a forecasting algorithm, which is based on the hidden Markov model (HMM). The forecasting results are utilized to awaken sensor node in the vicinity of future target position. According to the two properties, signal energy feature and residual energy, the sensor nodes decide whether to participate in target detection separately with a fuzzy control approach. Meanwhile, the local routing scheme of data transmission towards the observer is discussed. Experimental results demonstrate the efficiency of energy-aware scheduling of surveillance in wireless multimedia sensor network, where significant energy saving is achieved by the sensor awakening approach and data transmission paths are calculated with low computational complexity.
Broadcasting a message in a parallel computer

DOEpatents

Archer, Charles J; Faraj, Ahmad A

2013-04-16

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the child nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.
Broadcasting a message in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer that includes: transmitting, by the logical root to all of the nodes directly connected to the logical root, a message; and for each node except the logical root: receiving the message; if that node is the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received; if that node received the message from a parent node and if that node is not a leaf node, then transmitting the message to all of the childmore » nodes; and if that node received the message from a child node and if that node is not the physical root, then transmitting the message to all of the child nodes except the child node from which the message was received and transmitting the message to the parent node.« less
Cloud computing method for dynamically scaling a process across physical machine boundaries

DOEpatents

Gillen, Robert E.; Patton, Robert M.; Potok, Thomas E.; Rojas, Carlos C.

2014-09-02

A cloud computing platform includes first device having a graph or tree structure with a node which receives data. The data is processed by the node or communicated to a child node for processing. A first node in the graph or tree structure determines the reconfiguration of a portion of the graph or tree structure on a second device. The reconfiguration may include moving a second node and some or all of its descendant nodes. The second and descendant nodes may be copied to the second device.
Fault tolerant features and experiments of ANTS distributed real-time system

NASA Astrophysics Data System (ADS)

Dominic-Savio, Patrick; Lo, Jien-Chung; Tufts, Donald W.

1995-01-01

The ANTS project at the University of Rhode Island introduces the concept of Active Nodal Task Seeking (ANTS) as a way to efficiently design and implement dependable, high-performance, distributed computing. This paper presents the fault tolerant design features that have been incorporated in the ANTS experimental system implementation. The results of performance evaluations and fault injection experiments are reported. The fault-tolerant version of ANTS categorizes all computing nodes into three groups. They are: the up-and-running green group, the self-diagnosing yellow group and the failed red group. Each available computing node will be placed in the yellow group periodically for a routine diagnosis. In addition, for long-life missions, ANTS uses a monitoring scheme to identify faulty computing nodes. In this monitoring scheme, the communication pattern of each computing node is monitored by two other nodes.
Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

PubMed

Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

2016-04-01

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2014-04-01

Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2014-04-22

Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Blocksome, Michael A

2013-07-02

Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
A multi-sensor RSS spatial sensing-based robust stochastic optimization algorithm for enhanced wireless tethering.

PubMed

Parasuraman, Ramviyas; Fabry, Thomas; Molinari, Luca; Kershaw, Keith; Di Castro, Mario; Masi, Alessandro; Ferre, Manuel

2014-12-12

The reliability of wireless communication in a network of mobile wireless robot nodes depends on the received radio signal strength (RSS). When the robot nodes are deployed in hostile environments with ionizing radiations (such as in some scientific facilities), there is a possibility that some electronic components may fail randomly (due to radiation effects), which causes problems in wireless connectivity. The objective of this paper is to maximize robot mission capabilities by maximizing the wireless network capacity and to reduce the risk of communication failure. Thus, in this paper, we consider a multi-node wireless tethering structure called the "server-relay-client" framework that uses (multiple) relay nodes in between a server and a client node. We propose a robust stochastic optimization (RSO) algorithm using a multi-sensor-based RSS sampling method at the relay nodes to efficiently improve and balance the RSS between the source and client nodes to improve the network capacity and to provide redundant networking abilities. We use pre-processing techniques, such as exponential moving averaging and spatial averaging filters on the RSS data for smoothing. We apply a receiver spatial diversity concept and employ a position controller on the relay node using a stochastic gradient ascent method for self-positioning the relay node to achieve the RSS balancing task. The effectiveness of the proposed solution is validated by extensive simulations and field experiments in CERN facilities. For the field trials, we used a youBot mobile robot platform as the relay node, and two stand-alone Raspberry Pi computers as the client and server nodes. The algorithm has been proven to be robust to noise in the radio signals and to work effectively even under non-line-of-sight conditions.
A Multi-Sensor RSS Spatial Sensing-Based Robust Stochastic Optimization Algorithm for Enhanced Wireless Tethering

PubMed Central

Parasuraman, Ramviyas; Fabry, Thomas; Molinari, Luca; Kershaw, Keith; Di Castro, Mario; Masi, Alessandro; Ferre, Manuel

2014-01-01

The reliability of wireless communication in a network of mobile wireless robot nodes depends on the received radio signal strength (RSS). When the robot nodes are deployed in hostile environments with ionizing radiations (such as in some scientific facilities), there is a possibility that some electronic components may fail randomly (due to radiation effects), which causes problems in wireless connectivity. The objective of this paper is to maximize robot mission capabilities by maximizing the wireless network capacity and to reduce the risk of communication failure. Thus, in this paper, we consider a multi-node wireless tethering structure called the “server-relay-client” framework that uses (multiple) relay nodes in between a server and a client node. We propose a robust stochastic optimization (RSO) algorithm using a multi-sensor-based RSS sampling method at the relay nodes to efficiently improve and balance the RSS between the source and client nodes to improve the network capacity and to provide redundant networking abilities. We use pre-processing techniques, such as exponential moving averaging and spatial averaging filters on the RSS data for smoothing. We apply a receiver spatial diversity concept and employ a position controller on the relay node using a stochastic gradient ascent method for self-positioning the relay node to achieve the RSS balancing task. The effectiveness of the proposed solution is validated by extensive simulations and field experiments in CERN facilities. For the field trials, we used a youBot mobile robot platform as the relay node, and two stand-alone Raspberry Pi computers as the client and server nodes. The algorithm has been proven to be robust to noise in the radio signals and to work effectively even under non-line-of-sight conditions. PMID:25615734
Checkpointing for a hybrid computing node

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cher, Chen-Yong

2016-03-08

According to an aspect, a method for checkpointing in a hybrid computing node includes executing a task in a processing accelerator of the hybrid computing node. A checkpoint is created in a local memory of the processing accelerator. The checkpoint includes state data to restart execution of the task in the processing accelerator upon a restart operation. Execution of the task is resumed in the processing accelerator after creating the checkpoint. The state data of the checkpoint are transferred from the processing accelerator to a main processor of the hybrid computing node while the processing accelerator is executing the task.
Multiple Time Series Node Synchronization Utilizing Ambient Reference

DTIC Science & Technology

2014-12-31

assessment, is the need for fine scale synchronization among communicating nodes and across multiple domains. The severe requirements that Special...processing targeted to performance assessment, is the need for fine scale synchronization among communicating nodes and across multiple domains. The...research community and it is well documented and characterized. The datasets considered from this project (listed below) were used to derive the
Managing internode data communications for an uninitialized process in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior tomore » initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.« less
Managing internode data communications for an uninitialized process in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Hyperswitch Network For Hypercube Computer

NASA Technical Reports Server (NTRS)

Chow, Edward; Madan, Herbert; Peterson, John

1989-01-01

Data-driven dynamic switching enables high speed data transfer. Proposed hyperswitch network based on mixed static and dynamic topologies. Routing header modified in response to congestion or faults encountered as path established. Static topology meets requirement if nodes have switching elements that perform necessary routing header revisions dynamically. Hypercube topology now being implemented with switching element in each computer node aimed at designing very-richly-interconnected multicomputer system. Interconnection network connects great number of small computer nodes, using fixed hypercube topology, characterized by point-to-point links between nodes.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE PAGES

Yim, Won Cheol; Cushman, John C.

2017-07-22

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Overview of the LINCS architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fletcher, J.G.; Watson, R.W.

1982-01-13

Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yim, Won Cheol; Cushman, John C.

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Fuzzy Neural Network-Based Interacting Multiple Model for Multi-Node Target Tracking Algorithm

PubMed Central

Sun, Baoliang; Jiang, Chunlan; Li, Ming

2016-01-01

An interacting multiple model for multi-node target tracking algorithm was proposed based on a fuzzy neural network (FNN) to solve the multi-node target tracking problem of wireless sensor networks (WSNs). Measured error variance was adaptively adjusted during the multiple model interacting output stage using the difference between the theoretical and estimated values of the measured error covariance matrix. The FNN fusion system was established during multi-node fusion to integrate with the target state estimated data from different nodes and consequently obtain network target state estimation. The feasibility of the algorithm was verified based on a network of nine detection nodes. Experimental results indicated that the proposed algorithm could trace the maneuvering target effectively under sensor failure and unknown system measurement errors. The proposed algorithm exhibited great practicability in the multi-node target tracking of WSNs. PMID:27809271

Ultrascalable petaflop parallel supercomputer

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Running Batch Jobs on Peregrine | High-Performance Computing | NREL

Science.gov Websites

Using Resource Feature to Request Different Node Types Peregrine has several types of compute nodes incompatibility and get the job running. More information about requesting different node types in Peregrine is available. Queues In order to meet the needs of different types of jobs, nodes on Peregrine are available
Balancing computation and communication power in power constrained clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piga, Leonardo; Paul, Indrani; Huang, Wei

Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agentmore » may be configured to reassign the unused power to the active nodes to expedite workload processing.« less
Message communications of particular message types between compute nodes using DMA shadow buffers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael A.; Parker, Jeffrey J.

Message communications of particular message types between compute nodes using DMA shadow buffers includes: receiving a buffer identifier specifying an application buffer having a message of a particular type for transmission to a target compute node through a network; selecting one of a plurality of shadow buffers for a DMA engine on the compute node for storing the message, each shadow buffer corresponding to a slot of an injection FIFO buffer maintained by the DMA engine; storing the message in the selected shadow buffer; creating a data descriptor for the message stored in the selected shadow buffer; injecting the datamore » descriptor into the slot of the injection FIFO buffer corresponding to the selected shadow buffer; selecting the data descriptor from the injection FIFO buffer; and transmitting the message specified by the selected data descriptor through the data communications network to the target compute node.« less
Parallel file system with metadata distributed across partitioned key-value store c

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

2017-09-19

Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Error recovery to enable error-free message transfer between nodes of a computer network

DOEpatents

Blumrich, Matthias A.; Coteus, Paul W.; Chen, Dong; Gara, Alan; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd; Steinmacher-Burow, Burkhard; Vranas, Pavlos M.

2016-01-26

An error-recovery method to enable error-free message transfer between nodes of a computer network. A first node of the network sends a packet to a second node of the network over a link between the nodes, and the first node keeps a copy of the packet on a sending end of the link until the first node receives acknowledgment from the second node that the packet was received without error. The second node tests the packet to determine if the packet is error free. If the packet is not error free, the second node sets a flag to mark the packet as corrupt. The second node returns acknowledgement to the first node specifying whether the packet was received with or without error. When the packet is received with error, the link is returned to a known state and the packet is sent again to the second node.
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Multi-source feature extraction and target recognition in wireless sensor networks based on adaptive distributed wavelet compression algorithms

NASA Astrophysics Data System (ADS)

Hortos, William S.

2008-04-01

Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at participating nodes. Therefore, the feature-extraction method based on the Haar DWT is presented that employs a maximum-entropy measure to determine significant wavelet coefficients. Features are formed by calculating the energy of coefficients grouped around the competing clusters. A DWT-based feature extraction algorithm used for vehicle classification in WSNs can be enhanced by an added rule for selecting the optimal number of resolution levels to improve the correct classification rate and reduce energy consumption expended in local algorithm computations. Published field trial data for vehicular ground targets, measured with multiple sensor types, are used to evaluate the wavelet-assisted algorithms. Extracted features are used in established target recognition routines, e.g., the Bayesian minimum-error-rate classifier, to compare the effects on the classification performance of the wavelet compression. Simulations of feature sets and recognition routines at different resolution levels in target scenarios indicate the impact on classification rates, while formulas are provided to estimate reduction in resource use due to distributed compression.
Energy-Efficient Implementation of ECDH Key Exchange for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Lederer, Christian; Mader, Roland; Koschuch, Manuel; Großschädl, Johann; Szekely, Alexander; Tillich, Stefan

Wireless Sensor Networks (WSNs) are playing a vital role in an ever-growing number of applications ranging from environmental surveillance over medical monitoring to home automation. Since WSNs are often deployed in unattended or even hostile environments, they can be subject to various malicious attacks, including the manipulation and capture of nodes. The establishment of a shared secret key between two or more individual nodes is one of the most important security services needed to guarantee the proper functioning of a sensor network. Despite some recent advances in this field, the efficient implementation of cryptographic key establishment for WSNs remains a challenge due to the resource constraints of small sensor nodes such as the MICAz mote. In this paper we present a lightweight implementation of the elliptic curve Diffie-Hellman (ECDH) key exchange for ZigBee-compliant sensor nodes equipped with an ATmega128 processor running the TinyOS operating system. Our implementation uses a 192-bit prime field specified by the NIST as underlying algebraic structure and requires only 5.20 ·106 clock cycles to compute a scalar multiplication if the base point is fixed and known a priori. A scalar multiplication using a random base point takes about 12.33 ·106 cycles. Our results show that a full ECDH key exchange between two MICAz motes consumes an energy of 57.33 mJ (including radio communication), which is significantly better than most previously reported ECDH implementations on comparable platforms.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

PubMed

Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik

2008-03-01

Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
On Deployment of Multiple Base Stations for Energy-Efficient Communication in Wireless Sensor Networks

DOE PAGES

Lin, Yunyue; Wu, Qishi; Cai, Xiaoshan; ...

2010-01-01

Data transmission from sensor nodes to a base station or a sink node often incurs significant energy consumption, which critically affects network lifetime. We generalize and solve the problem of deploying multiple base stations to maximize network lifetime in terms of two different metrics under one-hop and multihop communication models. In the one-hop communication model, the sensors far away from base stations always deplete their energy much faster than others. We propose an optimal solution and a heuristic approach based on the minimal enclosing circle algorithm to deploy a base station at the geometric center of each cluster. In themore » multihop communication model, both base station location and data routing mechanism need to be considered in maximizing network lifetime. We propose an iterative algorithm based on rigorous mathematical derivations and use linear programming to compute the optimal routing paths for data transmission. Simulation results show the distinguished performance of the proposed deployment algorithms in maximizing network lifetime.« less
Variant hairy cell leukemia following papillary urothelial neoplasm of bladder.

PubMed

Beyan, Cengiz; Kaptan, Kürsat

2014-03-01

A 65 years old man was admitted with multiple lymphadenopathy, weight loss, night sweats and fatigue for 2 months. He had been treated for bladder cancer 2 years ago. Leukocyte count was 37.9 x10(9)/l. Peripheral blood smear had 91% lymphocytes. Lymphocytes had large nuclei with prominent nucleoli, heterogeneous appearance, and large cytoplasm with hairy projections. Flow cytometric immunophenotyping revealed CD20, CD22, CD24, CD45 and HLA-DR positivity. Atypical lymphocytes were stained with tartrate resistant acid phosphatase. Increased metabolic activity was detected in multiple lymph nodes, bone marrow and extremely enlarged spleen with positron emission tomography-computed tomography. Excisional biopsy of the left axillary lymph node revealed infiltration with diffuse B-cell leukemia/lymphoma. Immunohistochemistry showed CD20 positive atypical cells with weak expression of CD11c. The patient was diagnosed as a case of variant hairy cell leukemia and cladribine was administered. A probable second primary malignancy should be kept in mind in cases with a defined malignancy in the presence of unusual symptoms.
Improved nine-node shell element MITC9i with reduced distortion sensitivity

NASA Astrophysics Data System (ADS)

Wisniewski, K.; Turska, E.

2017-11-01

The 9-node quadrilateral shell element MITC9i is developed for the Reissner-Mindlin shell kinematics, the extended potential energy and Green strain. The following features of its formulation ensure an improved behavior: 1. The MITC technique is used to avoid locking, and we propose improved transformations for bending and transverse shear strains, which render that all patch tests are passed for the regular mesh, i.e. with straight element sides and middle positions of midside nodes and a central node. 2. To reduce shape distortion effects, the so-called corrected shape functions of Celia and Gray (Int J Numer Meth Eng 20:1447-1459, 1984) are extended to shells and used instead of the standard ones. In effect, all patch tests are passed additionally for shifts of the midside nodes along straight element sides and for arbitrary shifts of the central node. 3. Several extensions of the corrected shape functions are proposed to enable computations of non-flat shells. In particular, a criterion is put forward to determine the shift parameters associated with the central node for non-flat elements. Additionally, the method is presented to construct a parabolic side for a shifted midside node, which improves accuracy for symmetric curved edges. Drilling rotations are included by using the drilling Rotation Constraint equation, in a way consistent with the additive/multiplicative rotation update scheme for large rotations. We show that the corrected shape functions reduce the sensitivity of the solution to the regularization parameter γ of the penalty method for this constraint. The MITC9i shell element is subjected to a range of linear and non-linear tests to show passing the patch tests, the absence of locking, very good accuracy and insensitivity to node shifts. It favorably compares to several other tested 9-node elements.
Remote direct memory access

DOEpatents

Archer, Charles J.; Blocksome, Michael A.

2012-12-11

Methods, parallel computers, and computer program products are disclosed for remote direct memory access. Embodiments include transmitting, from an origin DMA engine on an origin compute node to a plurality target DMA engines on target compute nodes, a request to send message, the request to send message specifying a data to be transferred from the origin DMA engine to data storage on each target compute node; receiving, by each target DMA engine on each target compute node, the request to send message; preparing, by each target DMA engine, to store data according to the data storage reference and the data length, including assigning a base storage address for the data storage reference; sending, by one or more of the target DMA engines, an acknowledgment message acknowledging that all the target DMA engines are prepared to receive a data transmission from the origin DMA engine; receiving, by the origin DMA engine, the acknowledgement message from the one or more of the target DMA engines; and transferring, by the origin DMA engine, data to data storage on each of the target compute nodes according to the data storage reference using a single direct put operation.
Peregrine System Configuration | High-Performance Computing | NREL

Science.gov Websites

nodes and storage are connected by a high speed InfiniBand network. Compute nodes are diskless with an directories are mounted on all nodes, along with a file system dedicated to shared projects. A brief processors with 64 GB of memory. All nodes are connected to the high speed Infiniband network and and a
Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

DOEpatents

Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2012-01-10

The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

DOEpatents

Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2008-01-01

The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Impact of fault models on probabilistic seismic hazard assessment: the example of the West Corinth rift.

NASA Astrophysics Data System (ADS)

Chartier, Thomas; Scotti, Oona; Boiselet, Aurelien; Lyon-Caen, Hélène

2016-04-01

Including faults in probabilistic seismic hazard assessment tends to increase the degree of uncertainty in the results due to the intrinsically uncertain nature of the fault data. This is especially the case in the low to moderate seismicity regions of Europe, where slow slipping faults are difficult to characterize. In order to better understand the key parameters that control the uncertainty in the fault-related hazard computations, we propose to build an analytic tool that provides a clear link between the different components of the fault-related hazard computations and their impact on the results. This will allow identifying the important parameters that need to be better constrained in order to reduce the resulting uncertainty in hazard and also provide a more hazard-oriented strategy for collecting relevant fault parameters in the field. The tool will be illustrated through the example of the West Corinth rifts fault-models. Recent work performed in the gulf has shown the complexity of the normal faulting system that is accommodating the extensional deformation of the rift. A logic-tree approach is proposed to account for this complexity and the multiplicity of scientifically defendable interpretations. At the nodes of the logic tree, different options that could be considered at each step of the fault-related seismic hazard will be considered. The first nodes represent the uncertainty in the geometries of the faults and their slip rates, which can derive from different data and methodologies. The subsequent node explores, for a given geometry/slip rate of faults, different earthquake rupture scenarios that may occur in the complex network of faults. The idea is to allow the possibility of several faults segments to break together in a single rupture scenario. To build these multiple-fault-segment scenarios, two approaches are considered: one based on simple rules (i.e. minimum distance between faults) and a second one that relies on physically-based simulations. The following nodes represents for each rupture scenario different rupture forecast models (i.e; characteristic or Gutenberg-Richter) and for a given rupture forecast, two probability models commonly used in seismic hazard assessment: poissonian or time-dependent. The final node represents an exhaustive set of ground motion prediction equations chosen in order to be compatible with the region. Finally, the expected probability of exceeding a given ground motion level is computed at each sites. Results will be discussed for a few specific localities of the West Corinth Gulf.
A New Privacy-Preserving Handover Authentication Scheme for Wireless Networks

PubMed Central

Wang, Changji; Yuan, Yuan; Wu, Jiayuan

2017-01-01

Handover authentication is a critical issue in wireless networks, which is being used to ensure mobile nodes wander over multiple access points securely and seamlessly. A variety of handover authentication schemes for wireless networks have been proposed in the literature. Unfortunately, existing handover authentication schemes are vulnerable to a few security attacks, or incur high communication and computation costs. Recently, He et al. proposed a handover authentication scheme PairHand and claimed it can resist various attacks without rigorous security proofs. In this paper, we show that PairHand does not meet forward secrecy and strong anonymity. More seriously, it is vulnerable to key compromise attack, where an adversary can recover the private key of any mobile node. Then, we propose a new efficient and provably secure handover authentication scheme for wireless networks based on elliptic curve cryptography. Compared with existing schemes, our proposed scheme can resist key compromise attack, and achieves forward secrecy and strong anonymity. Moreover, it is more efficient in terms of computation and communication. PMID:28632171
A New Privacy-Preserving Handover Authentication Scheme for Wireless Networks.

PubMed

Wang, Changji; Yuan, Yuan; Wu, Jiayuan

2017-06-20

Handover authentication is a critical issue in wireless networks, which is being used to ensure mobile nodes wander over multiple access points securely and seamlessly. A variety of handover authentication schemes for wireless networks have been proposed in the literature. Unfortunately, existing handover authentication schemes are vulnerable to a few security attacks, or incur high communication and computation costs. Recently, He et al. proposed a handover authentication scheme PairHand and claimed it can resist various attacks without rigorous security proofs. In this paper, we show that PairHand does not meet forward secrecy and strong anonymity. More seriously, it is vulnerable to key compromise attack, where an adversary can recover the private key of any mobile node. Then, we propose a new efficient and provably secure handover authentication scheme for wireless networks based on elliptic curve cryptography. Compared with existing schemes, our proposed scheme can resist key compromise attack, and achieves forward secrecy and strong anonymity. Moreover, it is more efficient in terms of computation and communication.

Disseminated cat-scratch disease: case report and review of the literature.

PubMed

Chang, Chih-Chen; Lee, Chia-Jie; Ou, Liang-Shiou; Wang, Chao-Jan; Huang, Yhu-Chering

2016-08-01

Cat scratch disease (CSD) can present as a systemic disease in 5-10% of cases and lead to various disease entities. A previously healthy 16-month-old boy presented with fever for 7 days without other obvious symptoms. Abdominal computed tomography scan demonstrated enlarged right inguinal lymph nodes and multiple small round hypodensities in the spleen. Despite antibiotic treatment for 1 week, the fever persisted and the intrasplenic lesions progressed. Inguinal lymph node biopsy confirmed CSD by immunohistochemistry staining. The diagnosis of CSD was also supported by a history of contact, imaging and serological findings. The patient recovered after treatment with azithromycin for a total of 5 weeks and, in serial follow-up, the hepatosplenic micro-abscesses resolved after 4th months.
Disseminated cat-scratch disease: case report and review of the literature.

PubMed

Chang, Chih-Chen; Lee, Chia-Jie; Ou, Liang-Shiou; Wang, Chao-Jan; Huang, Yhu-Chering

2016-01-12

Cat scratch disease (CSD) can present as a systemic disease in 5-10% of cases and lead to various disease entities. A previously healthy 16-month-old boy presented with fever for 7 days without other obvious symptoms. Abdominal computed tomography scan demonstrated enlarged right inguinal lymph nodes and multiple small round hypodensities in the spleen. Despite antibiotic treatment for 1 week, the fever persisted and the intrasplenic lesions progressed. Inguinal lymph node biopsy confirmed CSD by immunohistochemistry staining. The diagnosis of CSD was also supported by a history of contact, imaging and serological findings. The patient recovered after treatment with azithromycin for a total of 5 weeks and, in serial follow-up, the hepatosplenic micro-abscesses resolved after 4th months.
Design of sensor node platform for wireless biomedical sensor networks.

PubMed

Xijun, Chen; -H Meng, Max; Hongliang, Ren

2005-01-01

Design of low-cost, miniature, lightweight, ultra low-power, flexible sensor platform capable of customization and seamless integration into a wireless biomedical sensor network(WBSN) for health monitoring applications presents one of the most challenging tasks. In this paper, we propose a WBSN node platform featuring an ultra low-power microcontroller, an IEEE 802.15.4 compatible transceiver, and a flexible expansion connector. The proposed solution promises a cost-effective, flexible platform that allows easy customization, energy-efficient computation and communication. The development of a common platform for multiple physical sensors will increase reuse and alleviate costs of transition to a new generation of sensors. As a case study, we present an implementation of an ECG (Electrocardiogram) sensor.
Send-side matching of data communications messages

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-06-17

Send-side matching of data communications messages in a distributed computing system comprising a plurality of compute nodes, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message.
Locating hardware faults in a parallel computer

DOEpatents

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-04-13

Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
Dynamic resource allocation scheme for distributed heterogeneous computer systems

NASA Technical Reports Server (NTRS)

Liu, Howard T. (Inventor); Silvester, John A. (Inventor)

1991-01-01

This invention relates to a resource allocation in computer systems, and more particularly, to a method and associated apparatus for shortening response time and improving efficiency of a heterogeneous distributed networked computer system by reallocating the jobs queued up for busy nodes to idle, or less-busy nodes. In accordance with the algorithm (SIDA for short), the load-sharing is initiated by the server device in a manner such that extra overhead in not imposed on the system during heavily-loaded conditions. The algorithm employed in the present invention uses a dual-mode, server-initiated approach. Jobs are transferred from heavily burdened nodes (i.e., over a high threshold limit) to low burdened nodes at the initiation of the receiving node when: (1) a job finishes at a node which is burdened below a pre-established threshold level, or (2) a node is idle for a period of time as established by a wakeup timer at the node. The invention uses a combination of the local queue length and the local service rate ratio at each node as the workload indicator.
Intranode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2014-01-07

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Intranode data communications in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2013-07-23

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Metabolic PathFinding: inferring relevant pathways in biochemical networks.

PubMed

Croes, Didier; Couche, Fabian; Wodak, Shoshana J; van Helden, Jacques

2005-07-01

Our knowledge of metabolism can be represented as a network comprising several thousands of nodes (compounds and reactions). Several groups applied graph theory to analyse the topological properties of this network and to infer metabolic pathways by path finding. This is, however, not straightforward, with a major problem caused by traversing irrelevant shortcuts through highly connected nodes, which correspond to pool metabolites and co-factors (e.g. H2O, NADP and H+). In this study, we present a web server implementing two simple approaches, which circumvent this problem, thereby improving the relevance of the inferred pathways. In the simplest approach, the shortest path is computed, while filtering out the selection of highly connected compounds. In the second approach, the shortest path is computed on the weighted metabolic graph where each compound is assigned a weight equal to its connectivity in the network. This approach significantly increases the accuracy of the inferred pathways, enabling the correct inference of relatively long pathways (e.g. with as many as eight intermediate reactions). Available options include the calculation of the k-shortest paths between two specified seed nodes (either compounds or reactions). Multiple requests can be submitted in a queue. Results are returned by email, in textual as well as graphical formats (available in http://www.scmbb.ulb.ac.be/pathfinding/).
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

2014-06-10

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Distributed computation of graphics primitives on a transputer network

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

A method is developed for distributing the computation of graphics primitives on a parallel processing network. Off-the-shelf transputer boards are used to perform the graphics transformations and scan-conversion tasks that would normally be assigned to a single transputer based display processor. Each node in the network performs a single graphics primitive computation. Frequently requested tasks can be duplicated on several nodes. The results indicate that the current distribution of commands on the graphics network shows a performance degradation when compared to the graphics display board alone. A change to more computation per node for every communication (perform more complex tasks on each node) may cause the desired increase in throughput.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamically adjusting local routing strategies

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-03-16

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.
Deterministic delivery of remote entanglement on a quantum network.

PubMed

Humphreys, Peter C; Kalb, Norbert; Morits, Jaco P J; Schouten, Raymond N; Vermeulen, Raymond F L; Twitchen, Daniel J; Markham, Matthew; Hanson, Ronald

2018-06-01

Large-scale quantum networks promise to enable secure communication, distributed quantum computing, enhanced sensing and fundamental tests of quantum mechanics through the distribution of entanglement across nodes 1-7 . Moving beyond current two-node networks 8-13 requires the rate of entanglement generation between nodes to exceed the decoherence (loss) rate of the entanglement. If this criterion is met, intrinsically probabilistic entangling protocols can be used to provide deterministic remote entanglement at pre-specified times. Here we demonstrate this using diamond spin qubit nodes separated by two metres. We realize a fully heralded single-photon entanglement protocol that achieves entangling rates of up to 39 hertz, three orders of magnitude higher than previously demonstrated two-photon protocols on this platform 14 . At the same time, we suppress the decoherence rate of remote-entangled states to five hertz through dynamical decoupling. By combining these results with efficient charge-state control and mitigation of spectral diffusion, we deterministically deliver a fresh remote state with an average entanglement fidelity of more than 0.5 at every clock cycle of about 100 milliseconds without any pre- or post-selection. These results demonstrate a key building block for extended quantum networks and open the door to entanglement distribution across multiple remote nodes.
Sink-oriented Dynamic Location Service Protocol for Mobile Sinks with an Energy Efficient Grid-Based Approach.

PubMed

Jeon, Hyeonjae; Park, Kwangjin; Hwang, Dae-Joon; Choo, Hyunseung

2009-01-01

Sensor nodes transmit the sensed information to the sink through wireless sensor networks (WSNs). They have limited power, computational capacities and memory. Portable wireless devices are increasing in popularity. Mechanisms that allow information to be efficiently obtained through mobile WSNs are of significant interest. However, a mobile sink introduces many challenges to data dissemination in large WSNs. For example, it is important to efficiently identify the locations of mobile sinks and disseminate information from multi-source nodes to the multi-mobile sinks. In particular, a stationary dissemination path may no longer be effective in mobile sink applications, due to sink mobility. In this paper, we propose a Sink-oriented Dynamic Location Service (SDLS) approach to handle sink mobility. In SDLS, we propose an Eight-Direction Anchor (EDA) system that acts as a location service server. EDA prevents intensive energy consumption at the border sensor nodes and thus provides energy balancing to all the sensor nodes. Then we propose a Location-based Shortest Relay (LSR) that efficiently forwards (or relays) data from a source node to a sink with minimal delay path. Our results demonstrate that SDLS not only provides an efficient and scalable location service, but also reduces the average data communication overhead in scenarios with multiple and moving sinks and sources.
Embedding global barrier and collective in torus network with each node combining input from receivers according to class map for output to senders

DOEpatents

Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Heidelberger, Philip; Senger, Robert M; Salapura, Valentina; Steinmacher-Burow, Burkhard; Sugawara, Yutaka; Takken, Todd E

2013-08-27

Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
A Loader for Executing Multi-Binary Applications on the Thinking Machines CM-5: It's Not Just for SPMD Anymore

NASA Technical Reports Server (NTRS)

Becker, Jeffrey C.

1995-01-01

The Thinking Machines CM-5 platform was designed to run single program, multiple data (SPMD) applications, i.e., to run a single binary across all nodes of a partition, with each node possibly operating on different data. Certain classes of applications, such as multi-disciplinary computational fluid dynamics codes, are facilitated by the ability to have subsets of the partition nodes running different binaries. In order to extend the CM-5 system software to permit such applications, a multi-program loader was developed. This system is based on the dld loader which was originally developed for workstations. This paper provides a high level description of dld, and describes how it was ported to the CM-5 to provide support for multi-binary applications. Finally, it elaborates how the loader has been used to implement the CM-5 version of MPIRUN, a portable facility for running multi-disciplinary/multi-zonal MPI (Message-Passing Interface Standard) codes.
Three layers multi-granularity OCDM switching system based on learning-stateful PCE

NASA Astrophysics Data System (ADS)

Wang, Yubao; Liu, Yanfei; Sun, Hao

2017-10-01

In the existing three layers multi-granularity OCDM switching system (TLMG-OCDMSS), F-LSP, L-LSP and OC-LSP can be bundled as switching granularity. For CPU-intensive network, the node not only needs to compute the path but also needs to bundle the switching granularity so that the load of single node is heavy. The node will paralyze when the traffic of the node is too heavy, which will impact the performance of the whole network seriously. The introduction of stateful PCE(S-PCE) will effectively solve these problems. PCE is composed of two parts, namely, the path computation element and the database (TED and LSPDB), and returns the result of path computation to PCC (path computation clients) after PCC sends the path computation request to it. In this way, the pressure of the distributed path computation in each node is reduced. In this paper, we propose the concept of Learning PCE (L-PCE), which uses the existing LSPDB as the data source of PCE's learning. By this means, we can simplify the path computation and reduce the network delay, as a result, improving the performance of network.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Experimental and computational analysis of a large protein network that controls fat storage reveals the design principles of a signaling network.

PubMed

Al-Anzi, Bader; Arpp, Patrick; Gerges, Sherif; Ormerod, Christopher; Olsman, Noah; Zinn, Kai

2015-05-01

An approach combining genetic, proteomic, computational, and physiological analysis was used to define a protein network that regulates fat storage in budding yeast (Saccharomyces cerevisiae). A computational analysis of this network shows that it is not scale-free, and is best approximated by the Watts-Strogatz model, which generates "small-world" networks with high clustering and short path lengths. The network is also modular, containing energy level sensing proteins that connect to four output processes: autophagy, fatty acid synthesis, mRNA processing, and MAP kinase signaling. The importance of each protein to network function is dependent on its Katz centrality score, which is related both to the protein's position within a module and to the module's relationship to the network as a whole. The network is also divisible into subnetworks that span modular boundaries and regulate different aspects of fat metabolism. We used a combination of genetics and pharmacology to simultaneously block output from multiple network nodes. The phenotypic results of this blockage define patterns of communication among distant network nodes, and these patterns are consistent with the Watts-Strogatz model.

(Re)engineering Earth System Models to Expose Greater Concurrency for Ultrascale Computing: Practice, Experience, and Musings

NASA Astrophysics Data System (ADS)

Mills, R. T.

2014-12-01

As the high performance computing (HPC) community pushes towards the exascale horizon, the importance and prevalence of fine-grained parallelism in new computer architectures is increasing. This is perhaps most apparent in the proliferation of so-called "accelerators" such as the Intel Xeon Phi or NVIDIA GPGPUs, but the trend also holds for CPUs, where serial performance has grown slowly and effective use of hardware threads and vector units are becoming increasingly important to realizing high performance. This has significant implications for weather, climate, and Earth system modeling codes, many of which display impressive scalability across MPI ranks but take relatively little advantage of threading and vector processing. In addition to increasing parallelism, next generation codes will also need to address increasingly deep hierarchies for data movement: NUMA/cache levels, on node vs. off node, local vs. wide neighborhoods on the interconnect, and even in the I/O system. We will discuss some approaches (grounded in experiences with the Intel Xeon Phi architecture) for restructuring Earth science codes to maximize concurrency across multiple levels (vectors, threads, MPI ranks), and also discuss some novel approaches for minimizing expensive data movement/communication.
Deadlock-free class routes for collective communications embedded in a multi-dimensional torus network

DOEpatents

Chen, Dong; Eisley, Noel A.; Steinmacher-Burow, Burkhard; Heidelberger, Philip

2013-01-29

A computer implemented method and a system for routing data packets in a multi-dimensional computer network. The method comprises routing a data packet among nodes along one dimension towards a root node, each node having input and output communication links, said root node not having any outgoing uplinks, and determining at each node if the data packet has reached a predefined coordinate for the dimension or an edge of the subrectangle for the dimension, and if the data packet has reached the predefined coordinate for the dimension or the edge of the subrectangle for the dimension, determining if the data packet has reached the root node, and if the data packet has not reached the root node, routing the data packet among nodes along another dimension towards the root node.
Need for speed: An optimized gridding approach for spatially explicit disease simulations.

PubMed

Sellman, Stefan; Tsao, Kimberly; Tildesley, Michael J; Brommesson, Peter; Webb, Colleen T; Wennergren, Uno; Keeling, Matt J; Lindström, Tom

2018-04-01

Numerical models for simulating outbreaks of infectious diseases are powerful tools for informing surveillance and control strategy decisions. However, large-scale spatially explicit models can be limited by the amount of computational resources they require, which poses a problem when multiple scenarios need to be explored to provide policy recommendations. We introduce an easily implemented method that can reduce computation time in a standard Susceptible-Exposed-Infectious-Removed (SEIR) model without introducing any further approximations or truncations. It is based on a hierarchical infection process that operates on entire groups of spatially related nodes (cells in a grid) in order to efficiently filter out large volumes of susceptible nodes that would otherwise have required expensive calculations. After the filtering of the cells, only a subset of the nodes that were originally at risk are then evaluated for actual infection. The increase in efficiency is sensitive to the exact configuration of the grid, and we describe a simple method to find an estimate of the optimal configuration of a given landscape as well as a method to partition the landscape into a grid configuration. To investigate its efficiency, we compare the introduced methods to other algorithms and evaluate computation time, focusing on simulated outbreaks of foot-and-mouth disease (FMD) on the farm population of the USA, the UK and Sweden, as well as on three randomly generated populations with varying degree of clustering. The introduced method provided up to 500 times faster calculations than pairwise computation, and consistently performed as well or better than other available methods. This enables large scale, spatially explicit simulations such as for the entire continental USA without sacrificing realism or predictive power.
Need for speed: An optimized gridding approach for spatially explicit disease simulations

PubMed Central

Tildesley, Michael J.; Brommesson, Peter; Webb, Colleen T.; Wennergren, Uno; Lindström, Tom

2018-01-01

Numerical models for simulating outbreaks of infectious diseases are powerful tools for informing surveillance and control strategy decisions. However, large-scale spatially explicit models can be limited by the amount of computational resources they require, which poses a problem when multiple scenarios need to be explored to provide policy recommendations. We introduce an easily implemented method that can reduce computation time in a standard Susceptible-Exposed-Infectious-Removed (SEIR) model without introducing any further approximations or truncations. It is based on a hierarchical infection process that operates on entire groups of spatially related nodes (cells in a grid) in order to efficiently filter out large volumes of susceptible nodes that would otherwise have required expensive calculations. After the filtering of the cells, only a subset of the nodes that were originally at risk are then evaluated for actual infection. The increase in efficiency is sensitive to the exact configuration of the grid, and we describe a simple method to find an estimate of the optimal configuration of a given landscape as well as a method to partition the landscape into a grid configuration. To investigate its efficiency, we compare the introduced methods to other algorithms and evaluate computation time, focusing on simulated outbreaks of foot-and-mouth disease (FMD) on the farm population of the USA, the UK and Sweden, as well as on three randomly generated populations with varying degree of clustering. The introduced method provided up to 500 times faster calculations than pairwise computation, and consistently performed as well or better than other available methods. This enables large scale, spatially explicit simulations such as for the entire continental USA without sacrificing realism or predictive power. PMID:29624574
Network Coding for Function Computation

ERIC Educational Resources Information Center

Appuswamy, Rathinakumar

2011-01-01

In this dissertation, the following "network computing problem" is considered. Source nodes in a directed acyclic network generate independent messages and a single receiver node computes a target function f of the messages. The objective is to maximize the average number of times f can be computed per network usage, i.e., the "computing…
Computed tomographic atlas for the new international lymph node map for lung cancer: A radiation oncologist perspective.

PubMed

Lynch, Rod; Pitson, Graham; Ball, David; Claude, Line; Sarrut, David

2013-01-01

To develop a reproducible definition for each mediastinal lymph node station based on the new TNM classification for lung cancer. This paper proposes an atlas using the new international lymph node map used in the seventh edition of the TNM classification for lung cancer. Four radiation oncologists and 1 diagnostic radiologist were involved in the project to put forward a reproducible radiologic description for the lung lymph node stations. The International Association for the Study of Lung Cancer lymph node definitions for stations 1 to 11 have been described and illustrated on axial computed tomographic scan images using a certified radiotherapy planning system. This atlas will assist both diagnostic radiologists and radiation oncologists in accurately defining the lymph node stations on computed tomographic scan in patients diagnosed with lung cancer. Copyright © 2013 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Parallel checksumming of data chunks of a shared data object using a log-structured file system

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-09-06

Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.
Folding Proteins at 500 ns/hour with Work Queue.

PubMed

Abdul-Wahid, Badi'; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A

2012-10-01

Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Folding Proteins at 500 ns/hour with Work Queue

PubMed Central

Abdul-Wahid, Badi’; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A.

2014-01-01

Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour. PMID:25540799
Scaleable wireless web-enabled sensor networks

NASA Astrophysics Data System (ADS)

Townsend, Christopher P.; Hamel, Michael J.; Sonntag, Peter A.; Trutor, B.; Arms, Steven W.

2002-06-01

Our goal was to develop a long life, low cost, scalable wireless sensing network, which collects and distributes data from a wide variety of sensors over the internet. Time division multiple access was employed with RF transmitter nodes (each w/unique16 bit address) to communicate digital data to a single receiver (range 1/3 mile). One thousand five channel nodes can communicate to one receiver (30 minute update). Current draw (sleep) is 20 microamps, allowing 5 year battery life w/one 3.6 volt Li-Ion AA size battery. The network nodes include sensor excitation (AC or DC), multiplexer, instrumentation amplifier, 16 bit A/D converter, microprocessor, and RF link. They are compatible with thermocouples, strain gauges, load/torque transducers, inductive/capacitive sensors. The receiver (418 MHz) includes a single board computer (SBC) with Ethernet capability, internet file transfer protocols (XML/HTML), and data storage. The receiver detects data from specific nodes, performs error checking, records the data. The web server interrogates the SBC (from Microsoft's Internet Explorer or Netscape's Navigator) to distribute data. This system can collect data from thousands of remote sensors on a smart structure, and be shared by an unlimited number of users.
Quadratures with multiple nodes, power orthogonality, and moment-preserving spline approximation

NASA Astrophysics Data System (ADS)

Milovanovic, Gradimir V.

2001-01-01

Quadrature formulas with multiple nodes, power orthogonality, and some applications of such quadratures to moment-preserving approximation by defective splines are considered. An account on power orthogonality (s- and [sigma]-orthogonal polynomials) and generalized Gaussian quadratures with multiple nodes, including stable algorithms for numerical construction of the corresponding polynomials and Cotes numbers, are given. In particular, the important case of Chebyshev weight is analyzed. Finally, some applications in moment-preserving approximation of functions by defective splines are discussed.
Bad data packet capture device

DOEpatents

Chen, Dong; Gara, Alan; Heidelberger, Philip; Vranas, Pavlos

2010-04-20

An apparatus and method for capturing data packets for analysis on a network computing system includes a sending node and a receiving node connected by a bi-directional communication link. The sending node sends a data transmission to the receiving node on the bi-directional communication link, and the receiving node receives the data transmission and verifies the data transmission to determine valid data and invalid data and verify retransmissions of invalid data as corresponding valid data. A memory device communicates with the receiving node for storing the invalid data and the corresponding valid data. A computing node communicates with the memory device and receives and performs an analysis of the invalid data and the corresponding valid data received from the memory device.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
Surviving sepsis--a 3D integrative educational simulator.

PubMed

Ježek, Filip; Tribula, Martin; Kulhánek, Tomáš; Mateják, Marek; Privitzer, Pavol; Šilar, Jan; Kofránek, Jiří; Lhotská, Lenka

2015-08-01

Computer technology offers greater educational possibilities, notably simulation and virtual reality. This paper presents a technology which serves to integrate multiple modalities, namely 3D virtual reality, node-based simulator, Physiomodel explorer and explanatory physiological simulators employing Modelica language and Unity3D platform. This emerging tool chain should allow the authors to concentrate more on educational content instead of application development. The technology is demonstrated through Surviving sepsis educational scenario, targeted on Microsoft Windows Store platform.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Multiplex PageRank.

PubMed

Halu, Arda; Mondragón, Raúl J; Panzarasa, Pietro; Bianconi, Ginestra

2013-01-01

Many complex systems can be described as multiplex networks in which the same nodes can interact with one another in different layers, thus forming a set of interacting and co-evolving networks. Examples of such multiplex systems are social networks where people are involved in different types of relationships and interact through various forms of communication media. The ranking of nodes in multiplex networks is one of the most pressing and challenging tasks that research on complex networks is currently facing. When pairs of nodes can be connected through multiple links and in multiple layers, the ranking of nodes should necessarily reflect the importance of nodes in one layer as well as their importance in other interdependent layers. In this paper, we draw on the idea of biased random walks to define the Multiplex PageRank centrality measure in which the effects of the interplay between networks on the centrality of nodes are directly taken into account. In particular, depending on the intensity of the interaction between layers, we define the Additive, Multiplicative, Combined, and Neutral versions of Multiplex PageRank, and show how each version reflects the extent to which the importance of a node in one layer affects the importance the node can gain in another layer. We discuss these measures and apply them to an online multiplex social network. Findings indicate that taking the multiplex nature of the network into account helps uncover the emergence of rankings of nodes that differ from the rankings obtained from one single layer. Results provide support in favor of the salience of multiplex centrality measures, like Multiplex PageRank, for assessing the prominence of nodes embedded in multiple interacting networks, and for shedding a new light on structural properties that would otherwise remain undetected if each of the interacting networks were analyzed in isolation.
Decentralized operating procedures for orchestrating data and behavior across distributed military systems and assets

NASA Astrophysics Data System (ADS)

Peach, Nicholas

2011-06-01

In this paper, we present a method for a highly decentralized yet structured and flexible approach to achieve systems interoperability by orchestrating data and behavior across distributed military systems and assets with security considerations addressed from the beginning. We describe an architecture of a tool-based design of business processes called Decentralized Operating Procedures (DOP) and the deployment of DOPs onto run time nodes, supporting the parallel execution of each DOP at multiple implementation nodes (fixed locations, vehicles, sensors and soldiers) throughout a battlefield to achieve flexible and reliable interoperability. The described method allows the architecture to; a) provide fine grain control of the collection and delivery of data between systems; b) allow the definition of a DOP at a strategic (or doctrine) level by defining required system behavior through process syntax at an abstract level, agnostic of implementation details; c) deploy a DOP into heterogeneous environments by the nomination of actual system interfaces and roles at a tactical level; d) rapidly deploy new DOPs in support of new tactics and systems; e) support multiple instances of a DOP in support of multiple missions; f) dynamically add or remove run-time nodes from a specific DOP instance as missions requirements change; g) model the passage of, and business reasons for the transmission of each data message to a specific DOP instance to support accreditation; h) run on low powered computers with lightweight tactical messaging. This approach is designed to extend the capabilities of existing standards, such as the Generic Vehicle Architecture (GVA).
Understanding the implementation of evidence-based care: a structural network approach.

PubMed

Parchman, Michael L; Scoglio, Caterina M; Schumm, Phillip

2011-02-24

Recent study of complex networks has yielded many new insights into phenomenon such as social networks, the internet, and sexually transmitted infections. The purpose of this analysis is to examine the properties of a network created by the 'co-care' of patients within one region of the Veterans Health Affairs. Data were obtained for all outpatient visits from 1 October 2006 to 30 September 2008 within one large Veterans Integrated Service Network. Types of physician within each clinic were nodes connected by shared patients, with a weighted link representing the number of shared patients between each connected pair. Network metrics calculated included edge weights, node degree, node strength, node coreness, and node betweenness. Log-log plots were used to examine the distribution of these metrics. Sizes of k-core networks were also computed under multiple conditions of node removal. There were 4,310,465 encounters by 266,710 shared patients between 722 provider types (nodes) across 41 stations or clinics resulting in 34,390 edges. The number of other nodes to which primary care provider nodes have a connection (172.7) is 42% greater than that of general surgeons and two and one-half times as high as cardiology. The log-log plot of the edge weight distribution appears to be linear in nature, revealing a 'scale-free' characteristic of the network, while the distributions of node degree and node strength are less so. The analysis of the k-core network sizes under increasing removal of primary care nodes shows that about 10 most connected primary care nodes play a critical role in keeping the k-core networks connected, because their removal disintegrates the highest k-core network. Delivery of healthcare in a large healthcare system such as that of the US Department of Veterans Affairs (VA) can be represented as a complex network. This network consists of highly connected provider nodes that serve as 'hubs' within the network, and demonstrates some 'scale-free' properties. By using currently available tools to explore its topology, we can explore how the underlying connectivity of such a system affects the behavior of providers, and perhaps leverage that understanding to improve quality and outcomes of care.
Fast Inbound Top-K Query for Random Walk with Restart.

PubMed

Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

2015-09-01

Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

Generic algorithms for high performance scalable geocomputing

NASA Astrophysics Data System (ADS)

de Jong, Kor; Schmitz, Oliver; Karssenberg, Derek

2016-04-01

During the last decade, the characteristics of computing hardware have changed a lot. For example, instead of a single general purpose CPU core, personal computers nowadays contain multiple cores per CPU and often general purpose accelerators, like GPUs. Additionally, compute nodes are often grouped together to form clusters or a supercomputer, providing enormous amounts of compute power. For existing earth simulation models to be able to use modern hardware platforms, their compute intensive parts must be rewritten. This can be a major undertaking and may involve many technical challenges. Compute tasks must be distributed over CPU cores, offloaded to hardware accelerators, or distributed to different compute nodes. And ideally, all of this should be done in such a way that the compute task scales well with the hardware resources. This presents two challenges: 1) how to make good use of all the compute resources and 2) how to make these compute resources available for developers of simulation models, who may not (want to) have the required technical background for distributing compute tasks. The first challenge requires the use of specialized technology (e.g.: threads, OpenMP, MPI, OpenCL, CUDA). The second challenge requires the abstraction of the logic handling the distribution of compute tasks from the model-specific logic, hiding the technical details from the model developer. To assist the model developer, we are developing a C++ software library (called Fern) containing algorithms that can use all CPU cores available in a single compute node (distributing tasks over multiple compute nodes will be done at a later stage). The algorithms are grid-based (finite difference) and include local and spatial operations such as convolution filters. The algorithms handle distribution of the compute tasks to CPU cores internally. In the resulting model the low-level details of how this is done is separated from the model-specific logic representing the modeled system. This contrasts with practices in which code for distributing of compute tasks is mixed with model-specific code, and results in a better maintainable model. For flexibility and efficiency, the algorithms are configurable at compile-time with the respect to the following aspects: data type, value type, no-data handling, input value domain handling, and output value range handling. This makes the algorithms usable in very different contexts, without the need for making intrusive changes to existing models when using them. Applications that benefit from using the Fern library include the construction of forward simulation models in (global) hydrology (e.g. PCR-GLOBWB (Van Beek et al. 2011)), ecology, geomorphology, or land use change (e.g. PLUC (Verstegen et al. 2014)) and manipulation of hyper-resolution land surface data such as digital elevation models and remote sensing data. Using the Fern library, we have also created an add-on to the PCRaster Python Framework (Karssenberg et al. 2010) allowing its users to speed up their spatio-temporal models, sometimes by changing just a single line of Python code in their model. In our presentation we will give an overview of the design of the algorithms, providing examples of different contexts where they can be used to replace existing sequential algorithms, including the PCRaster environmental modeling software (www.pcraster.eu). We will show how the algorithms can be configured to behave differently when necessary. References Karssenberg, D., Schmitz, O., Salamon, P., De Jong, K. and Bierkens, M.F.P., 2010, A software framework for construction of process-based stochastic spatio-temporal models and data assimilation. Environmental Modelling & Software, 25, pp. 489-502, Link. Best Paper Award 2010: Software and Decision Support. Van Beek, L. P. H., Y. Wada, and M. F. P. Bierkens. 2011. Global monthly water stress: 1. Water balance and water availability. Water Resources Research. 47. Verstegen, J. A., D. Karssenberg, F. van der Hilst, and A. P. C. Faaij. 2014. Identifying a land use change cellular automaton by Bayesian data assimilation. Environmental Modelling & Software 53:121-136.
Lymphatic mapping with fluorescence navigation using indocyanine green and axillary surgery in patients with primary breast cancer.

PubMed

Takeuchi, Megumi; Sugie, Tomoharu; Abdelazeem, Kassim; Kato, Hironori; Shinkura, Nobuhiko; Takada, Masahiro; Yamashiro, Hiroyasu; Ueno, Takayuki; Toi, Masakazu

2012-01-01

The indocyanine green fluorescence (ICGf) navigation method provides real-time lymphatic mapping and sentinel lymph node (SLN) visualization, which enables the removal of SLNs and their associated lymphatic networks. In this study, we investigated the features of the drainage pathways detected with the ICGf navigation system and the order of metastasis in axillary nodes. From April 2008 to February 2010, 145 patients with clinically node-negative breast cancer underwent SLN surgery with ICGf navigation. The video-recorded data from 79 patients were used for lymphatic mapping analysis. We analyzed 145 patients with clinically node-negative breast cancer who underwent SLN surgery with the ICGf navigation system. Fluorescence-positive SLNs were identified in 144 (99%) of 145 patients. Both single and multiple routes to the axilla were identified in 47% of cases using video-recorded lymphatic mapping data. An internal mammary route was detected in 6% of the cases. Skip metastasis to the second or third SLNs was observed in 6 of the 28 node-positive patients. We also examined the strategy of axillary surgery using the ICGf navigation system. We found that, based on the features of nodal involvement, 4-node resection could provide precise information on the nodal status. The ICGf navigation system may provide a different lymphatic mapping result than computed tomography lymphography in clinically node-negative breast cancer patients. Furthermore, it enables the identification of lymph nodes that do not accumulate indocyanine green or dye adjacent to the SLNs in the sequence of drainage. Knowledge of the order of nodal metastasis as revealed by the ICGf system may help to personalize the surgical treatment of axilla in SLN-positive cases, although additional studies are required. © 2012 Wiley Periodicals, Inc.
Direct negative chronotropic action of desflurane on sinoatrial node pacemaker activity in the guinea pig heart.

PubMed

Kojima, Akiko; Ito, Yuki; Kitagawa, Hirotoshi; Matsuura, Hiroshi; Nosaka, Shuichi

2014-06-01

Desflurane inhalation is associated with sympathetic activation and concomitant increase in heart rate in humans and experimental animals. There is, however, little information concerning the direct effects of desflurane on electrical activity of sinoatrial node pacemaker cells that determines the intrinsic heart rate. Whole-cell patch-clamp experiments were conducted on guinea pig sinoatrial node pacemaker cells to record spontaneous action potentials and ionic currents contributing to sinoatrial node automaticity, namely, hyperpolarization-activated cation current (If), T-type and L-type Ca currents (ICa,T and ICa,L, respectively), Na/Ca exchange current (INCX), and rapidly and slowly activating delayed rectifier K currents (IKr and IKs, respectively). Electrocardiograms were recorded from ex vivo Langendorff-perfused hearts and in vivo hearts. Desflurane at 6 and 12% decreased spontaneous firing rate of sinoatrial node action potentials by 15.9% (n = 11) and 27.6% (n = 10), respectively, which was associated with 20.4% and 42.5% reductions in diastolic depolarization rate, respectively. Desflurane inhibited If, ICa,T, ICa,L, INCX, and IKs but had little effect on IKr. The negative chronotropic action of desflurane was reasonably well reproduced in sinoatrial node computer model. Desflurane reduced the heart rate in Langendorff-perfused hearts. High concentration (12%) of desflurane inhalation was associated with transient tachycardia, which was totally abolished by pretreatment with the β-adrenergic blocker propranolol. Desflurane has a direct negative chronotropic action on sinoatrial node pacemaking activity, which is mediated by its inhibitory action on multiple ionic currents. This direct inhibitory action of desflurane on sinoatrial node automaticity seems to be counteracted by sympathetic activation associated with desflurane inhalation in vivo.
Automatic detection of pelvic lymph nodes using multiple MR sequences

NASA Astrophysics Data System (ADS)

Yan, Michelle; Lu, Yue; Lu, Renzhi; Requardt, Martin; Moeller, Thomas; Takahashi, Satoru; Barentsz, Jelle

2007-03-01

A system for automatic detection of pelvic lymph nodes is developed by incorporating complementary information extracted from multiple MR sequences. A single MR sequence lacks sufficient diagnostic information for lymph node localization and staging. Correct diagnosis often requires input from multiple complementary sequences which makes manual detection of lymph nodes very labor intensive. Small lymph nodes are often missed even by highly-trained radiologists. The proposed system is aimed at assisting radiologists in finding lymph nodes faster and more accurately. To the best of our knowledge, this is the first such system reported in the literature. A 3-dimensional (3D) MR angiography (MRA) image is employed for extracting blood vessels that serve as a guide in searching for pelvic lymph nodes. Segmentation, shape and location analysis of potential lymph nodes are then performed using a high resolution 3D T1-weighted VIBE (T1-vibe) MR sequence acquired by Siemens 3T scanner. An optional contrast-agent enhanced MR image, such as post ferumoxtran-10 T2*-weighted MEDIC sequence, can also be incorporated to further improve detection accuracy of malignant nodes. The system outputs a list of potential lymph node locations that are overlaid onto the corresponding MR sequences and presents them to users with associated confidence levels as well as their sizes and lengths in each axis. Preliminary studies demonstrates the feasibility of automatic lymph node detection and scenarios in which this system may be used to assist radiologists in diagnosis and reporting.
Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

DOEpatents

Gooding, Thomas Michael [Rochester, MN

2011-04-19

An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
The efficacy of preoperative positron emission tomography-computed tomography (PET-CT) for detection of lymph node metastasis in cervical and endometrial cancer: clinical and pathological factors influencing it.

PubMed

Nogami, Yuya; Banno, Kouji; Irie, Haruko; Iida, Miho; Kisu, Iori; Masugi, Yohei; Tanaka, Kyoko; Tominaga, Eiichiro; Okuda, Shigeo; Murakami, Koji; Aoki, Daisuke

2015-01-01

We studied the diagnostic performance of (18)F-fluoro-2-deoxy-d-glucose-positron emission tomography/computed tomography in cervical and endometrial cancers with particular focus on lymph node metastases. Seventy patients with cervical cancer and 53 with endometrial cancer were imaged with (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography before lymphadenectomy. We evaluated the diagnostic performance of (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography using the final pathological diagnoses as the golden standard. We calculated the sensitivity, specificity, positive predictive value and negative predictive value of (18)F-fluoro-2-deoxy-D-glucose-positron emission tomography/computed tomography. In cervical cancer, the results evaluated by cases were 33.3, 92.7, 55.6 and 83.6%, respectively. When evaluated by the area of lymph nodes, the results were 30.6, 98.9, 55.0 and 97.0%, respectively. As for endometrial cancer, the results evaluated by cases were 50.0, 93.9, 40.0 and 95.8%, and by area of lymph nodes, 45.0, 99.4, 64.3 and 98.5%, respectively. The limitation of the efficacy was found out by analyzing it by the region of the lymph node, the size of metastatic node, the historical type of tumor in cervical cancer and the prevalence of lymph node metastasis. The efficacy of positron emission tomography/computed tomography regarding the detection of lymph node metastasis in cervical and endometrial cancer is not established and has limitations associated with the region of the lymph node, the size of metastasis lesion in lymph node and the pathological type of primary tumor. The indication for the imaging and the interpretation of the results requires consideration for each case by the pretest probability based on the information obtained preoperatively. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
BridgeRank: A novel fast centrality measure based on local structure of the network

NASA Astrophysics Data System (ADS)

Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh

2018-04-01

Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.
Global tree network for computing structures enabling global processing operations

DOEpatents

Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

2010-01-19

A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
ERDC MSRC Resource. High Performance Computing for the Warfighter. Spring 2006

DTIC Science & Technology

2006-01-01

named Ruby, and the HP/Compaq SC45, named Emerald , continue to add their unique sparkle to the ERDC MSRC computer infrastructure. ERDC invited the...configuration on B-52H purchased additional memory for the login nodes so that this part of the solution process could be done as a preprocessing step. On...application and system services. Of the service nodes, 10 are login nodes and 23 are input/output (I/O) server nodes for the Lustre file system (i.e., the
Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

NASA Astrophysics Data System (ADS)

Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

2011-12-01

With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs

NASA Technical Reports Server (NTRS)

Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan

2006-01-01

Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Integrative Inferences on Pattern Geometries of Grapes Grown under Water Stress and Their Resulting Wines.

PubMed

Hsieh, Fushing; Hsueh, Chih-Hsin; Heitkamp, Constantin; Matthews, Mark

2016-01-01

Multiple datasets of two consecutive vintages of replicated grape and wines from six different deficit irrigation regimes are characterized and compared. The process consists of four temporal-ordered signature phases: harvest field data, juice composition, wine composition before bottling and bottled wine. A new computing paradigm and an integrative inferential platform are developed for discovering phase-to-phase pattern geometries for such characterization and comparison purposes. Each phase is manifested by a distinct set of features, which are measurable upon phase-specific entities subject to the common set of irrigation regimes. Throughout the four phases, this compilation of data from irrigation regimes with subsamples is termed a space of media-nodes, on which measurements of phase-specific features were recoded. All of these collectively constitute a bipartite network of data, which is then normalized and binary coded. For these serial bipartite networks, we first quantify patterns that characterize individual phases by means of a new computing paradigm called "Data Mechanics". This computational technique extracts a coupling geometry which captures and reveals interacting dependence among and between media-nodes and feature-nodes in forms of hierarchical block sub-matrices. As one of the principal discoveries, the holistic year-factor persistently surfaces as the most inferential factor in classifying all media-nodes throughout all phases. This could be deemed either surprising in its over-arching dominance or obvious based on popular belief. We formulate and test pattern-based hypotheses that confirm such fundamental patterns. We also attempt to elucidate the driving force underlying the phase-evolution in winemaking via a newly developed partial coupling geometry, which is designed to integrate two coupling geometries. Such partial coupling geometries are confirmed to bear causal and predictive implications. All pattern inferences are performed with respect to a profile of energy distributions sampled from network bootstrapping ensembles conforming to block-structures specified by corresponding hypotheses.
Integrative Inferences on Pattern Geometries of Grapes Grown under Water Stress and Their Resulting Wines

PubMed Central

Hsieh, Fushing; Hsueh, Chih-Hsin; Heitkamp, Constantin; Matthews, Mark

2016-01-01

Multiple datasets of two consecutive vintages of replicated grape and wines from six different deficit irrigation regimes are characterized and compared. The process consists of four temporal-ordered signature phases: harvest field data, juice composition, wine composition before bottling and bottled wine. A new computing paradigm and an integrative inferential platform are developed for discovering phase-to-phase pattern geometries for such characterization and comparison purposes. Each phase is manifested by a distinct set of features, which are measurable upon phase-specific entities subject to the common set of irrigation regimes. Throughout the four phases, this compilation of data from irrigation regimes with subsamples is termed a space of media-nodes, on which measurements of phase-specific features were recoded. All of these collectively constitute a bipartite network of data, which is then normalized and binary coded. For these serial bipartite networks, we first quantify patterns that characterize individual phases by means of a new computing paradigm called “Data Mechanics”. This computational technique extracts a coupling geometry which captures and reveals interacting dependence among and between media-nodes and feature-nodes in forms of hierarchical block sub-matrices. As one of the principal discoveries, the holistic year-factor persistently surfaces as the most inferential factor in classifying all media-nodes throughout all phases. This could be deemed either surprising in its over-arching dominance or obvious based on popular belief. We formulate and test pattern-based hypotheses that confirm such fundamental patterns. We also attempt to elucidate the driving force underlying the phase-evolution in winemaking via a newly developed partial coupling geometry, which is designed to integrate two coupling geometries. Such partial coupling geometries are confirmed to bear causal and predictive implications. All pattern inferences are performed with respect to a profile of energy distributions sampled from network bootstrapping ensembles conforming to block-structures specified by corresponding hypotheses. PMID:27508416
Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing

ERIC Educational Resources Information Center

Loh, Matthew

2013-01-01

The scalability of CMOS technology has driven computation into a diverse range of applications across the power consumption, performance and size spectra. Communication is a necessary adjunct to computation, and whether this is to push data from node-to-node in a high-performance computing cluster or from the receiver of wireless link to a neural…
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Enabling a high throughput real time data pipeline for a large radio telescope array with GPUs

NASA Astrophysics Data System (ADS)

Edgar, R. G.; Clark, M. A.; Dale, K.; Mitchell, D. A.; Ord, S. M.; Wayth, R. B.; Pfister, H.; Greenhill, L. J.

2010-10-01

The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. Raw data will be generated continuously at 5 GiB s-1, grouped into 8 s cadences. This high throughput motivates the development of on-site, real time processing and reduction in preference to archiving, transport and off-line processing. Each batch of 8 s data must be completely reduced before the next batch arrives. Maintaining real time operation will require a sustained performance of around 2.5 TFLOP s-1 (including convolutions, FFTs, interpolations and matrix multiplications). We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-Watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized between the nodes hosting the GPUs. The MWA will be a frontier scientific instrument and a pathfinder for planned peta- and exa-scale facilities.
Study of Solid State Drives performance in PROOF distributed analysis system

NASA Astrophysics Data System (ADS)

Panitkin, S. Y.; Ernst, M.; Petkus, R.; Rind, O.; Wenaus, T.

2010-04-01

Solid State Drives (SSD) is a promising storage technology for High Energy Physics parallel analysis farms. Its combination of low random access time and relatively high read speed is very well suited for situations where multiple jobs concurrently access data located on the same drive. It also has lower energy consumption and higher vibration tolerance than Hard Disk Drive (HDD) which makes it an attractive choice in many applications raging from personal laptops to large analysis farms. The Parallel ROOT Facility - PROOF is a distributed analysis system which allows to exploit inherent event level parallelism of high energy physics data. PROOF is especially efficient together with distributed local storage systems like Xrootd, when data are distributed over computing nodes. In such an architecture the local disk subsystem I/O performance becomes a critical factor, especially when computing nodes use multi-core CPUs. We will discuss our experience with SSDs in PROOF environment. We will compare performance of HDD with SSD in I/O intensive analysis scenarios. In particular we will discuss PROOF system performance scaling with a number of simultaneously running analysis jobs.
Multiple Factors-Aware Diffusion in Social Networks

DTIC Science & Technology

2015-05-22

Multiple Factors-Aware Diffusion in Social Networks Chung-Kuang Chou(B) and Ming-Syan Chen Department of Electrical Engineering, National Taiwan...propagates from nodes to nodes over a social network . The behavior that a node adopts an information piece in a social network can be affected by...Twitter dataset. Keywords: Social networks · Diffusion models 1 Introduction Information diffusion in social networks has been an active research field
Distributed Water Pollution Source Localization with Mobile UV-Visible Spectrometer Probes in Wireless Sensor Networks.

PubMed

Ma, Junjie; Meng, Fansheng; Zhou, Yuexi; Wang, Yeyao; Shi, Ping

2018-02-16

Pollution accidents that occur in surface waters, especially in drinking water source areas, greatly threaten the urban water supply system. During water pollution source localization, there are complicated pollutant spreading conditions and pollutant concentrations vary in a wide range. This paper provides a scalable total solution, investigating a distributed localization method in wireless sensor networks equipped with mobile ultraviolet-visible (UV-visible) spectrometer probes. A wireless sensor network is defined for water quality monitoring, where unmanned surface vehicles and buoys serve as mobile and stationary nodes, respectively. Both types of nodes carry UV-visible spectrometer probes to acquire in-situ multiple water quality parameter measurements, in which a self-adaptive optical path mechanism is designed to flexibly adjust the measurement range. A novel distributed algorithm, called Dual-PSO, is proposed to search for the water pollution source, where one particle swarm optimization (PSO) procedure computes the water quality multi-parameter measurements on each node, utilizing UV-visible absorption spectra, and another one finds the global solution of the pollution source position, regarding mobile nodes as particles. Besides, this algorithm uses entropy to dynamically recognize the most sensitive parameter during searching. Experimental results demonstrate that online multi-parameter monitoring of a drinking water source area with a wide dynamic range is achieved by this wireless sensor network and water pollution sources are localized efficiently with low-cost mobile node paths.
Distributed Water Pollution Source Localization with Mobile UV-Visible Spectrometer Probes in Wireless Sensor Networks

PubMed Central

Zhou, Yuexi; Wang, Yeyao; Shi, Ping

2018-01-01

Pollution accidents that occur in surface waters, especially in drinking water source areas, greatly threaten the urban water supply system. During water pollution source localization, there are complicated pollutant spreading conditions and pollutant concentrations vary in a wide range. This paper provides a scalable total solution, investigating a distributed localization method in wireless sensor networks equipped with mobile ultraviolet-visible (UV-visible) spectrometer probes. A wireless sensor network is defined for water quality monitoring, where unmanned surface vehicles and buoys serve as mobile and stationary nodes, respectively. Both types of nodes carry UV-visible spectrometer probes to acquire in-situ multiple water quality parameter measurements, in which a self-adaptive optical path mechanism is designed to flexibly adjust the measurement range. A novel distributed algorithm, called Dual-PSO, is proposed to search for the water pollution source, where one particle swarm optimization (PSO) procedure computes the water quality multi-parameter measurements on each node, utilizing UV-visible absorption spectra, and another one finds the global solution of the pollution source position, regarding mobile nodes as particles. Besides, this algorithm uses entropy to dynamically recognize the most sensitive parameter during searching. Experimental results demonstrate that online multi-parameter monitoring of a drinking water source area with a wide dynamic range is achieved by this wireless sensor network and water pollution sources are localized efficiently with low-cost mobile node paths. PMID:29462929

Circuit for Communication Over Power Lines

NASA Technical Reports Server (NTRS)

Krasowski, Michael J.; Prokop, Normal F.; Greer, Lawrence C., III; Nappier, Jennifer

2011-01-01

Many distributed systems share common sensors and instruments along with a common power line supplying current to the system. A communication technique and circuit has been developed that allows for the simple inclusion of an instrument, sensor, or actuator node within any system containing a common power bus. Wherever power is available, a node can be added, which can then draw power for itself, its associated sensors, and actuators from the power bus all while communicating with other nodes on the power bus. The technique modulates a DC power bus through capacitive coupling using on-off keying (OOK), and receives and demodulates the signal from the DC power bus through the same capacitive coupling. The circuit acts as serial modem for the physical power line communication. The circuit and technique can be made of commercially available components or included in an application specific integrated circuit (ASIC) design, which allows for the circuit to be included in current designs with additional circuitry or embedded into new designs. This device and technique moves computational, sensing, and actuation abilities closer to the source, and allows for the networking of multiple similar nodes to each other and to a central processor. This technique also allows for reconfigurable systems by adding or removing nodes at any time. It can do so using nothing more than the in situ power wiring of the system.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation

NASA Astrophysics Data System (ADS)

Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

2014-12-01

Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Signaling completion of a message transfer from an origin compute node to a target compute node

DOEpatents

Blocksome, Michael A [Rochester, MN; Parker, Jeffrey J [Rochester, MN

2011-05-24

Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local direct put transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor.
Dedicated heterogeneous node scheduling including backfill scheduling

DOEpatents

Wood, Robert R [Livermore, CA; Eckert, Philip D [Livermore, CA; Hommes, Gregg [Pleasanton, CA

2006-07-25

A method and system for job backfill scheduling dedicated heterogeneous nodes in a multi-node computing environment. Heterogeneous nodes are grouped into homogeneous node sub-pools. For each sub-pool, a free node schedule (FNS) is created so that the number of to chart the free nodes over time. For each prioritized job, using the FNS of sub-pools having nodes useable by a particular job, to determine the earliest time range (ETR) capable of running the job. Once determined for a particular job, scheduling the job to run in that ETR. If the ETR determined for a lower priority job (LPJ) has a start time earlier than a higher priority job (HPJ), then the LPJ is scheduled in that ETR if it would not disturb the anticipated start times of any HPJ previously scheduled for a future time. Thus, efficient utilization and throughput of such computing environments may be increased by utilizing resources otherwise remaining idle.
Signaling completion of a message transfer from an origin compute node to a target compute node

DOEpatents

Blocksome, Michael A [Rochester, MN

2011-02-15

Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local memory FIFO data transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor.
Support Vector Machines Model of Computed Tomography for Assessing Lymph Node Metastasis in Esophageal Cancer with Neoadjuvant Chemotherapy.

PubMed

Wang, Zhi-Long; Zhou, Zhi-Guo; Chen, Ying; Li, Xiao-Ting; Sun, Ying-Shi

The aim of this study was to diagnose lymph node metastasis of esophageal cancer by support vector machines model based on computed tomography. A total of 131 esophageal cancer patients with preoperative chemotherapy and radical surgery were included. Various indicators (tumor thickness, tumor length, tumor CT value, total number of lymph nodes, and long axis and short axis sizes of largest lymph node) on CT images before and after neoadjuvant chemotherapy were recorded. A support vector machines model based on these CT indicators was built to predict lymph node metastasis. Support vector machines model diagnosed lymph node metastasis better than preoperative short axis size of largest lymph node on CT. The area under the receiver operating characteristic curves were 0.887 and 0.705, respectively. The support vector machine model of CT images can help diagnose lymph node metastasis in esophageal cancer with preoperative chemotherapy.
Parallel computer processing and modeling: applications for the ICU

NASA Astrophysics Data System (ADS)

Baxter, Grant; Pranger, L. Alex; Draghic, Nicole; Sims, Nathaniel M.; Wiesmann, William P.

2003-07-01

Current patient monitoring procedures in hospital intensive care units (ICUs) generate vast quantities of medical data, much of which is considered extemporaneous and not evaluated. Although sophisticated monitors to analyze individual types of patient data are routinely used in the hospital setting, this equipment lacks high order signal analysis tools for detecting long-term trends and correlations between different signals within a patient data set. Without the ability to continuously analyze disjoint sets of patient data, it is difficult to detect slow-forming complications. As a result, the early onset of conditions such as pneumonia or sepsis may not be apparent until the advanced stages. We report here on the development of a distributed software architecture test bed and software medical models to analyze both asynchronous and continuous patient data in real time. Hardware and software has been developed to support a multi-node distributed computer cluster capable of amassing data from multiple patient monitors and projecting near and long-term outcomes based upon the application of physiologic models to the incoming patient data stream. One computer acts as a central coordinating node; additional computers accommodate processing needs. A simple, non-clinical model for sepsis detection was implemented on the system for demonstration purposes. This work shows exceptional promise as a highly effective means to rapidly predict and thereby mitigate the effect of nosocomial infections.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Spiking network simulation code for petascale computers.

PubMed

Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M; Plesser, Hans E; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz

2014-01-01

Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today.
Spiking network simulation code for petascale computers

PubMed Central

Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M.; Plesser, Hans E.; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz

2014-01-01

Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today. PMID:25346682
Space-Time Processing for Tactical Mobile Ad Hoc Networks

DTIC Science & Technology

2007-08-01

rates in mobile ad hoc networks. In addition, he has considered the design of a cross-layer multi-user resource allocation framework using a... framework for many-to-one communication. In this context, multiple nodes cooperate to transmit their packets simultaneously to a single node using multi...spatially multiplexed signals transmitted from multiple nodes. Our goal is to form a framework that activates different sets of communication links
Noguchi uses laptop computer in the Node 2 during Expedition 22

NASA Image and Video Library

2010-01-19

ISS022-E-030641 (19 Jan. 2010) --- Japan Aerospace Exploration Agency (JAXA) astronaut Soichi Noguchi, Expedition 22 flight engineer, uses a computer in the Harmony node of the International Space Station.
Megatux

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-09-25

The Megatux platform enables the emulation of large scale (multi-million node) distributed systems. In particular, it allows for the emulation of large-scale networks interconnecting a very large number of emulated computer systems. It does this by leveraging virtualization and associated technologies to allow hundreds of virtual computers to be hosted on a single moderately sized server or workstation. Virtualization technology provided by modern processors allows for multiple guest OSs to run at the same time, sharing the hardware resources. The Megatux platform can be deployed on a single PC, a small cluster of a few boxes or a large clustermore » of computers. With a modest cluster, the Megatux platform can emulate complex organizational networks. By using virtualization, we emulate the hardware, but run actual software enabling large scale without sacrificing fidelity.« less
Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition.

PubMed

Hébert-Dufresne, Laurent; Grochow, Joshua A; Allard, Antoine

2016-08-18

We introduce a network statistic that measures structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and interpretable at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.
Communicability across evolving networks.

PubMed

Grindrod, Peter; Parsons, Mark C; Higham, Desmond J; Estrada, Ernesto

2011-04-01

Many natural and technological applications generate time-ordered sequences of networks, defined over a fixed set of nodes; for example, time-stamped information about "who phoned who" or "who came into contact with who" arise naturally in studies of communication and the spread of disease. Concepts and algorithms for static networks do not immediately carry through to this dynamic setting. For example, suppose A and B interact in the morning, and then B and C interact in the afternoon. Information, or disease, may then pass from A to C, but not vice versa. This subtlety is lost if we simply summarize using the daily aggregate network given by the chain A-B-C. However, using a natural definition of a walk on an evolving network, we show that classic centrality measures from the static setting can be extended in a computationally convenient manner. In particular, communicability indices can be computed to summarize the ability of each node to broadcast and receive information. The computations involve basic operations in linear algebra, and the asymmetry caused by time's arrow is captured naturally through the noncommutativity of matrix-matrix multiplication. Illustrative examples are given for both synthetic and real-world communication data sets. We also discuss the use of the new centrality measures for real-time monitoring and prediction.
An S N Algorithm for Modern Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Randal Scott

2016-08-29

LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Staging Lung Cancer: Metastasis.

PubMed

Shroff, Girish S; Viswanathan, Chitra; Carter, Brett W; Benveniste, Marcelo F; Truong, Mylene T; Sabloff, Bradley S

2018-05-01

The updated eighth edition of the tumor, node, metastasis (TNM) classification for lung cancer includes revisions to T and M descriptors. In terms of the M descriptor, the classification of intrathoracic metastatic disease as M1a is unchanged from TNM-7. Extrathoracic metastatic disease, which was classified as M1b in TNM-7, is now subdivided into M1b (single metastasis, single organ) and M1c (multiple metastases in one or multiple organs) descriptors. In this article, the rationale for changes in the M descriptors, the utility of preoperative staging with PET/computed tomography, and the treatment options available for patients with oligometastatic disease are discussed. Copyright © 2018 Elsevier Inc. All rights reserved.
Perforated Tuberculosis Lymphadenitis

PubMed Central

Cataño, Juan; Cardeño, John

2013-01-01

A 26-year-old man (human immunodeficiency virus-positive and not taking highly active antiretroviral treatment [HAART]) presented to the emergency room with 2 months of malaise, 20 kg weight loss, high spiking fevers, generalized lymph nodes, night sweats, dry cough, and chest pain when swallowing. On physical examination, he had multiple cervical lymphadenopathies. Suspecting a systemic opportunistic infection, a contrasted chest computed tomography (CT) was done, showing an esophageal to mediastinum fistulae. Two days after admission, a fluoroscopic contrasted endoscopy was done that showed two esophageal fistulae from scrofula to esophagus and then, to mediastinum. A bronchoalveolar lavage and a cervical lymphadenopathy biopsy were done, both showing multiple acid-fast bacillae, where cultures grew Mycobacterium tuberculosis. PMID:23740190
Man-Made Object Extraction from Remote Sensing Imagery by Graph-Based Manifold Ranking

NASA Astrophysics Data System (ADS)

He, Y.; Wang, X.; Hu, X. Y.; Liu, S. H.

2018-04-01

The automatic extraction of man-made objects from remote sensing imagery is useful in many applications. This paper proposes an algorithm for extracting man-made objects automatically by integrating a graph model with the manifold ranking algorithm. Initially, we estimate a priori value of the man-made objects with the use of symmetric and contrast features. The graph model is established to represent the spatial relationships among pre-segmented superpixels, which are used as the graph nodes. Multiple characteristics, namely colour, texture and main direction, are used to compute the weights of the adjacent nodes. Manifold ranking effectively explores the relationships among all the nodes in the feature space as well as initial query assignment; thus, it is applied to generate a ranking map, which indicates the scores of the man-made objects. The man-made objects are then segmented on the basis of the ranking map. Two typical segmentation algorithms are compared with the proposed algorithm. Experimental results show that the proposed algorithm can extract man-made objects with high recognition rate and low omission rate.
Multi-hop routing mechanism for reliable sensor computing.

PubMed

Chen, Jiann-Liang; Ma, Yi-Wei; Lai, Chia-Ping; Hu, Chia-Cheng; Huang, Yueh-Min

2009-01-01

Current research on routing in wireless sensor computing concentrates on increasing the service lifetime, enabling scalability for large number of sensors and supporting fault tolerance for battery exhaustion and broken nodes. A sensor node is naturally exposed to various sources of unreliable communication channels and node failures. Sensor nodes have many failure modes, and each failure degrades the network performance. This work develops a novel mechanism, called Reliable Routing Mechanism (RRM), based on a hybrid cluster-based routing protocol to specify the best reliable routing path for sensor computing. Table-driven intra-cluster routing and on-demand inter-cluster routing are combined by changing the relationship between clusters for sensor computing. Applying a reliable routing mechanism in sensor computing can improve routing reliability, maintain low packet loss, minimize management overhead and save energy consumption. Simulation results indicate that the reliability of the proposed RRM mechanism is around 25% higher than that of the Dynamic Source Routing (DSR) and ad hoc On-demand Distance Vector routing (AODV) mechanisms.

Multi-petascale highly efficient parallel supercomputer

DOEpatents

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

2015-07-14

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
A universal computer control system for motors

NASA Technical Reports Server (NTRS)

Szakaly, Zoltan F. (Inventor)

1991-01-01

A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
Application of Emerging Open-source Embedded Systems for Enabling Low-cost Wireless Mini-observatory Nodes in the Coastal Zone

NASA Astrophysics Data System (ADS)

Glazer, B. T.

2016-02-01

Here, we describe the development of novel, low-cost, open-source instrumentation to enable wireless data transfer of biogeochemical sensors in the coastal zone. The platform is centered upon the Beaglebone Black single board computer. Process-inquiry in environmental sciences suffers from undersampling; enabling sustained and unattended data collection typically involves expensive instrumentation and infrastructure deployed as cabled observatories with little flexibility in deployment location following initial installation. High cost of commercially-available or custom electronic packages have not only limited the number of sensor node sites that can be targeted by reasonably well-funded academic researchers, but have also entirely prohibited widespread engagement with K-12, public non-profit, and `citizen scientist' STEM audiences. The new platform under development represents a balanced blend of research-grade sensors and low-cost open-source electronics that are easily assembled. Custom, robust, open-source code that remains customizable for specific node configurations can match a specific deployment's measurement needs, depending on the scientific research priorities. We have demonstrated prototype capabilities and versatility through lab testing and field deployments of multiple sensor nodes with multiple sensor inputs, all of which are streaming near-real-time data over wireless RF links to a shore-based base station. On shore, first-pass data processing QA/QC takes place and near-real-time plots are made available on the World Wide Web. Specifically, we have worked closely with an environmental and cultural management and restoration non-profit organization, and middle and high school science classes, engaging their interest in STEM application to local watershed processes. Ultimately, continued successful development of this pilot project can lead to a coastal oceanographic analogue of the popular Weather Underground personal weather station model.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.

PubMed

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-07

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
The predictive value of single-photon emission computed tomography/computed tomography for sentinel lymph node localization in head and neck cutaneous malignancy.

PubMed

Remenschneider, Aaron K; Dilger, Amanda E; Wang, Yingbing; Palmer, Edwin L; Scott, James A; Emerick, Kevin S

2015-04-01

Preoperative localization of sentinel lymph nodes in head and neck cutaneous malignancies can be aided by single-photon emission computed tomography/computed tomography (SPECT/CT); however, its true predictive value for identifying lymph nodes intraoperatively remains unquantified. This study aims to understand the sensitivity, specificity, and positive and negative predictive values of SPECT/CT in sentinel lymph node biopsy for cutaneous malignancies of the head and neck. Blinded retrospective imaging review with comparison to intraoperative gamma probe confirmed sentinel lymph nodes. A consecutive series of patients with a head and neck cutaneous malignancy underwent preoperative SPECT/CT followed by sentinel lymph node biopsy with a gamma probe. Two nuclear medicine physicians, blinded to clinical data, independently reviewed each SPECT/CT. Activity within radiographically defined nodal basins was recorded and compared to intraoperative gamma probe findings. Sensitivity, specificity, and negative and positive predictive values were calculated with subgroup stratification by primary tumor site. Ninety-two imaging reads were performed on 47 patients with cutaneous malignancy who underwent SPECT/CT followed by sentinel lymph node biopsy. Overall sensitivity was 73%, specificity 92%, positive predictive value 54%, and negative predictive value 96%. The predictive ability of SPECT/CT to identify the basin or an adjacent basin containing the single hottest node was 92%. SPECT/CT overestimated uptake by an average of one nodal basin. In the head and neck, SPECT/CT has higher reliability for primary lesions of the eyelid, scalp, and cheek. SPECT/CT has high sensitivity, specificity, and negative predictive value, but may overestimate relevant nodal basins in sentinel lymph node biopsy. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.
OPEX: Optimized Eccentricity Computation in Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Henderson, Keith

2011-11-14

Real-world graphs have many properties of interest, but often these properties are expensive to compute. We focus on eccentricity, radius and diameter in this work. These properties are useful measures of the global connectivity patterns in a graph. Unfortunately, computing eccentricity for all nodes is O(n2) for a graph with n nodes. We present OPEX, a novel combination of optimizations which improves computation time of these properties by orders of magnitude in real-world experiments on graphs of many different sizes. We run OPEX on graphs with up to millions of links. OPEX gives either exact results or bounded approximations, unlikemore » its competitors which give probabilistic approximations or sacrifice node-level information (eccentricity) to compute graphlevel information (diameter).« less
Effecting a broadcast with an allreduce operation on a parallel computer

DOEpatents

Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
Wireless visual sensor network resource allocation using cross-layer optimization

NASA Astrophysics Data System (ADS)

Bentley, Elizabeth S.; Matyjas, John D.; Medley, Michael J.; Kondi, Lisimachos P.

2009-01-01

In this paper, we propose an approach to manage network resources for a Direct Sequence Code Division Multiple Access (DS-CDMA) visual sensor network where nodes monitor scenes with varying levels of motion. It uses cross-layer optimization across the physical layer, the link layer and the application layer. Our technique simultaneously assigns a source coding rate, a channel coding rate, and a power level to all nodes in the network based on one of two criteria that maximize the quality of video of the entire network as a whole, subject to a constraint on the total chip rate. One criterion results in the minimal average end-to-end distortion amongst all nodes, while the other criterion minimizes the maximum distortion of the network. Our approach allows one to determine the capacity of the visual sensor network based on the number of nodes and the quality of video that must be transmitted. For bandwidth-limited applications, one can also determine the minimum bandwidth needed to accommodate a number of nodes with a specific target chip rate. Video captured by a sensor node camera is encoded and decoded using the H.264 video codec by a centralized control unit at the network layer. To reduce the computational complexity of the solution, Universal Rate-Distortion Characteristics (URDCs) are obtained experimentally to relate bit error probabilities to the distortion of corrupted video. Bit error rates are found first by using Viterbi's upper bounds on the bit error probability and second, by simulating nodes transmitting data spread by Total Square Correlation (TSC) codes over a Rayleigh-faded DS-CDMA channel and receiving that data using Auxiliary Vector (AV) filtering.
Embedding global and collective in a torus network with message class map based tree path selection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Dong; Coteus, Paul W.; Eisley, Noel A.

Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computermore » program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.« less
Direct memory access transfer completion notification

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Parker, Jeffrey J.

2010-08-17

Methods, apparatus, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA engine on an origin compute node in an injection FIFO buffer, a data descriptor for an application message to be transferred to a target compute node on behalf of an application on the origin compute node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying an address of a completion notification field in application storage for the application; transferring, by the origin DMA engine to the target compute node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that the transfer of the message is complete, including performing a local direct put operation to store predesignated notification data at the address of the completion notification field.
The Earth System Grid Federation (ESGF) Project

NASA Astrophysics Data System (ADS)

Carenton-Madiec, Nicolas; Denvil, Sébastien; Greenslade, Mark

2015-04-01

The Earth System Grid Federation (ESGF) Peer-to-Peer (P2P) enterprise system is a collaboration that develops, deploys and maintains software infrastructure for the management, dissemination, and analysis of model output and observational data. ESGF's primary goal is to facilitate advancements in Earth System Science. It is an interagency and international effort led by the US Department of Energy (DOE), and co-funded by National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA), National Science Foundation (NSF), Infrastructure for the European Network of Earth System Modelling (IS-ENES) and international laboratories such as the Max Planck Institute for Meteorology (MPI-M) german Climate Computing Centre (DKRZ), the Australian National University (ANU) National Computational Infrastructure (NCI), Institut Pierre-Simon Laplace (IPSL), and the British Atmospheric Data Center (BADC). Its main mission is to support current CMIP5 activities and prepare for future assesments. The ESGF architecture is based on a system of autonomous and distributed nodes, which interoperate through common acceptance of federation protocols and trust agreements. Data is stored at multiple nodes around the world, and served through local data and metadata services. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions. The net result is that a user can use a web browser, connect to any node, and seamlessly find and access data throughout the federation. This type of collaborative working organization and distributed architecture context en-lighted the need of integration and testing processes definition to ensure the quality of software releases and interoperability. This presentation will introduce the ESGF project and demonstrate the range of tools and processes that have been set up to support release management activities.
INDIRECT COMPUTED TOMOGRAPHIC LYMPHOGRAPHY FOR ILIOSACRAL LYMPHATIC MAPPING IN A COHORT OF DOGS WITH ANAL SAC GLAND ADENOCARCINOMA: TECHNIQUE DESCRIPTION.

PubMed

Majeski, Stephanie A; Steffey, Michele A; Fuller, Mark; Hunt, Geraldine B; Mayhew, Philipp D; Pollard, Rachel E

2017-05-01

Sentinel lymph node mapping can help to direct surgical oncologic staging and metastatic disease detection in patients with complex lymphatic pathways. We hypothesized that indirect computed tomographic lymphography (ICTL) with a water-soluble iodinated contrast agent would successfully map lymphatic pathways of the iliosacral lymphatic center in dogs with anal sac gland carcinoma, providing a potential preoperative method for iliosacral sentinel lymph node identification in dogs. Thirteen adult dogs diagnosed with anal sac gland carcinoma were enrolled in this prospective, pilot study, and ICTL was performed via peritumoral contrast injection with serial caudal abdominal computed tomography scans for iliosacral sentinel lymph node identification. Technical and descriptive details for ICTL were recorded, including patient positioning, total contrast injection volume, timing of contrast visualization, and sentinel lymph nodes and lymphatic pathways identified. Indirect CT lymphography identified lymphatic pathways and sentinel lymph nodes in 12/13 cases (92%). Identified sentinel lymph nodes were ipsilateral to the anal sac gland carcinoma in 8/12 and contralateral to the anal sac gland carcinoma in 4/12 cases. Sacral, internal iliac, and medial iliac lymph nodes were identified as sentinel lymph nodes, and patterns were widely variable. Patient positioning and timing of imaging may impact successful sentinel lymph node identification. Positioning in supported sternal recumbency is recommended. Results indicate that ICTL may be a feasible technique for sentinel lymph node identification in dogs with anal sac gland carcinoma and offer preliminary data to drive further investigation of iliosacral lymphatic metastatic patterns using ICTL and sentinel lymph node biopsy. © 2017 American College of Veterinary Radiology.
Development of an extensible dual-core wireless sensing node for cyber-physical systems

NASA Astrophysics Data System (ADS)

Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.

2014-04-01

The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE PAGES

Wang, Bei; Ethier, Stephane; Tang, William; ...

2017-06-29

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Bei; Ethier, Stephane; Tang, William

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
A texture-based framework for improving CFD data visualization in a virtual environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bivins, Gerrick O'Ron

2005-01-01

In the field of computational fluid dynamics (CFD) accurate representations of fluid phenomena can be simulated hut require large amounts of data to represent the flow domain. Most datasets generated from a CFD simulation can be coarse, ~10,000 nodes or cells, or very fine with node counts on the order of 1,000,000. A typical dataset solution can also contain multiple solutions for each node, pertaining to various properties of the flow at a particular node. Scalar properties such as density, temperature, pressure, and velocity magnitude are properties that are typically calculated and stored in a dataset solution. Solutions are notmore » limited to just scalar properties. Vector quantities, such as velocity, are also often calculated and stored for a CFD simulation. Accessing all of this data efficiently during runtime is a key problem for visualization in an interactive application. Understanding simulation solutions requires a post-processing tool to convert the data into something more meaningful. Ideally, the application would present an interactive visual representation of the numerical data for any dataset that was simulated while maintaining the accuracy of the calculated solution. Most CFD applications currently sacrifice interactivity for accuracy, yielding highly detailed flow descriptions hut limiting interaction for investigating the field.« less
A texture-based frameowrk for improving CFD data visualization in a virtual environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bivins, Gerrick O'Ron

2005-01-01

In the field of computational fluid dynamics (CFD) accurate representations of fluid phenomena can be simulated but require large amounts of data to represent the flow domain. Most datasets generated from a CFD simulation can be coarse, ~ 10,000 nodes or cells, or very fine with node counts on the order of 1,000,000. A typical dataset solution can also contain multiple solutions for each node, pertaining to various properties of the flow at a particular node. Scalar properties such as density, temperature, pressure, and velocity magnitude are properties that are typically calculated and stored in a dataset solution. Solutions aremore » not limited to just scalar properties. Vector quantities, such as velocity, are also often calculated and stored for a CFD simulation. Accessing all of this data efficiently during runtime is a key problem for visualization in an interactive application. Understanding simulation solutions requires a post-processing tool to convert the data into something more meaningful. Ideally, the application would present an interactive visual representation of the numerical data for any dataset that was simulated while maintaining the accuracy of the calculated solution. Most CFD applications currently sacrifice interactivity for accuracy, yielding highly detailed flow descriptions but limiting interaction for investigating the field.« less
Transoral videolaryngoscopic surgery for papillary carcinoma arising in lingual thyroid.

PubMed

Mogi, Chisato; Shinomiya, Hirotaka; Fujii, Natsumi; Tsuruta, Tomoyuki; Morita, Naruhiko; Furukawa, Tatsuya; Teshima, Masanori; Kanzawa, Maki; Hirokawa, Mitsuyoshi; Otsuki, Naoki; Nibu, Ken-Ichi

2018-05-15

Carcinoma arising in lingual thyroid is an extremely rare entity accounting for only 1% of all reported ectopic thyroids. Here, we report a case of carcinoma arising in lingual thyroid, which has been successfully managed by transoral resection and bilateral neck dissections. A lingual mass 4-cm in diameter with calcification was incidentally detected by computed tomography at medical check-up. No thyroid tissue was observed in normal position. Ultrasound examination showed bilateral multiple lymphadenopathies. Fine needle aspiration biopsy from lymph node in his right neck was diagnosed as Class III and thyroglobulin level of the specimen was 459ng/ml. Due to the difficulty in performing FNA of the lingual masses, right neck dissection was performed in advance for diagnostic purpose. Pathological examination showed existence of large and small follicular thyroid tissues in several lymph nodes, suggesting lymph node metastasis from thyroid carcinoma. Two months after the initial surgery, video-assisted transoral resection of lingual thyroid with simultaneous left neck dissection was performed. Postoperative course was uneventful. Papillary carcinoma was found in the lingual thyroid and thyroid tissues were also found in left cervical lymph nodes. Video-assisted transoral resection was useful for the treatment of thyroid cancer arising in lingual thyroid. Copyright © 2018 Elsevier B.V. All rights reserved.
Cytological Diagnosis of an Uncommon High Grade Malignant Thyroid Tumour: A Case Report.

PubMed

Nagpal, Ruchi; Kaushal, Manju; Kumar, Sawan

2017-07-01

Anaplastic Thyroid Carcinoma (ATC) is a relatively uncommon highly malignant tumour originating from the follicular cells of thyroid gland having poor prognosis. It accounts for 2% to 5% of all thyroid carcinomas and patients typically present with a rapidly growing anterior neck mass with aggressive symptoms. A 53-year-old male presented with diffuse neck swelling measuring 8x6 cm and right cervical lymph node measuring 2x2 cm since one month which was associated with dyspepsia and dyspnoea. Ultrasound and Contrast Enhanced Computed Tomography (CECT) neck revealed enlarged right lobe of thyroid and multiple enlarged cervical lymph nodes with soft tissue density nodules in bilateral lungs. Fine Needle Aspiration (FNA) from the swelling revealed giant cell, spindle cell and squamoid pattern. Focal areas showed follicular epithelial cells arranged in repeated microfollicular pattern suggesting an underlying follicular neoplasm. FNAC smears from the lymph node also revealed similar findings. Based on the cytomorphological and radiological findings, final diagnosis of ATC probably arising from underlying follicular carcinoma with cervical lymph node and lung metastasis was given. FNAC leads to prompt and definitive diagnosis, so that therapy can be initiated as soon as possible for better outcome. Multimodality therapy (surgery, external beam radiation, and chemotherapy) is the mainstay of treatment.
Cytological Diagnosis of an Uncommon High Grade Malignant Thyroid Tumour: A Case Report

PubMed Central

Kaushal, Manju; Kumar, Sawan

2017-01-01

Anaplastic Thyroid Carcinoma (ATC) is a relatively uncommon highly malignant tumour originating from the follicular cells of thyroid gland having poor prognosis. It accounts for 2% to 5% of all thyroid carcinomas and patients typically present with a rapidly growing anterior neck mass with aggressive symptoms. A 53-year-old male presented with diffuse neck swelling measuring 8x6 cm and right cervical lymph node measuring 2x2 cm since one month which was associated with dyspepsia and dyspnoea. Ultrasound and Contrast Enhanced Computed Tomography (CECT) neck revealed enlarged right lobe of thyroid and multiple enlarged cervical lymph nodes with soft tissue density nodules in bilateral lungs. Fine Needle Aspiration (FNA) from the swelling revealed giant cell, spindle cell and squamoid pattern. Focal areas showed follicular epithelial cells arranged in repeated microfollicular pattern suggesting an underlying follicular neoplasm. FNAC smears from the lymph node also revealed similar findings. Based on the cytomorphological and radiological findings, final diagnosis of ATC probably arising from underlying follicular carcinoma with cervical lymph node and lung metastasis was given. FNAC leads to prompt and definitive diagnosis, so that therapy can be initiated as soon as possible for better outcome. Multimodality therapy (surgery, external beam radiation, and chemotherapy) is the mainstay of treatment. PMID:28892908

Sequoia Messaging Rate Benchmark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friedley, Andrew

2008-01-22

The purpose of this benchmark is to measure the maximal message rate of a single compute node. The first num_cores ranks are expected to reside on the 'core' compute node for which message rate is being tested. After that, the next num_nbors ranks are neighbors for the first core rank, the next set of num_nbors ranks are neighbors for the second core rank, and so on. For example, testing an 8-core node (num_cores = 8) with 4 neighbors (num_nbors = 4) requires 8 + 8 * 4 - 40 ranks. The first 8 of those 40 ranks are expected tomore » be on the 'core' node being benchmarked, while the rest of the ranks are on separate nodes.« less
A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine.

PubMed

Duan, Mingxing; Li, Kenli; Liao, Xiangke; Li, Keqin

2018-06-01

As data sets become larger and more complicated, an extreme learning machine (ELM) that runs in a traditional serial environment cannot realize its ability to be fast and effective. Although a parallel ELM (PELM) based on MapReduce to process large-scale data shows more efficient learning speed than identical ELM algorithms in a serial environment, some operations, such as intermediate results stored on disks and multiple copies for each task, are indispensable, and these operations create a large amount of extra overhead and degrade the learning speed and efficiency of the PELMs. In this paper, an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification. By partitioning the corresponding data sets reasonably, the hidden layer output matrix calculation algorithm, matrix decomposition algorithm, and matrix decomposition algorithm perform most of the computations locally. At the same time, they retain the intermediate results in distributed memory and cache the diagonal matrix as broadcast variables instead of several copies for each task to reduce a large amount of the costs, and these actions strengthen the learning ability of the SELM. Finally, we implement our SELM algorithm to classify large data sets. Extensive experiments have been conducted to validate the effectiveness of the proposed algorithms. As shown, our SELM achieves an speedup on a cluster with ten nodes, and reaches a speedup with 15 nodes, an speedup with 20 nodes, a speedup with 25 nodes, a speedup with 30 nodes, and a speedup with 35 nodes.
Indoor A* Pathfinding Through an Octree Representation of a Point Cloud

NASA Astrophysics Data System (ADS)

Rodenberg, O. B. P. M.; Verbree, E.; Zlatanova, S.

2016-10-01

There is a growing demand of 3D indoor pathfinding applications. Researched in the field of robotics during the last decades of the 20th century, these methods focussed on 2D navigation. Nowadays we would like to have the ability to help people navigate inside buildings or send a drone inside a building when this is too dangerous for people. What these examples have in common is that an object with a certain geometry needs to find an optimal collision free path between a start and goal point. This paper presents a new workflow for pathfinding through an octree representation of a point cloud. We applied the following steps: 1) the point cloud is processed so it fits best in an octree; 2) during the octree generation the interior empty nodes are filtered and further processed; 3) for each interior empty node the distance to the closest occupied node directly under it is computed; 4) a network graph is computed for all empty nodes; 5) the A* pathfinding algorithm is conducted. This workflow takes into account the connectivity for each node to all possible neighbours (face, edge and vertex and all sizes). Besides, a collision avoidance system is pre-processed in two steps: first, the clearance of each empty node is computed, and then the maximal crossing value between two empty neighbouring nodes is computed. The clearance is used to select interior empty nodes of appropriate size and the maximal crossing value is used to filter the network graph. Finally, both these datasets are used in A* pathfinding.
A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks.

PubMed

Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

2010-04-21

The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Large scale cardiac modeling on the Blue Gene supercomputer.

PubMed

Reumann, Matthias; Fitch, Blake G; Rayshubskiy, Aleksandr; Keller, David U; Weiss, Daniel L; Seemann, Gunnar; Dössel, Olaf; Pitman, Michael C; Rice, John J

2008-01-01

Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
Continuously phase-modulated standing surface acoustic waves for separation of particles and cells in microfluidic channels containing multiple pressure nodes

NASA Astrophysics Data System (ADS)

Lee, Junseok; Rhyou, Chanryeol; Kang, Byungjun; Lee, Hyungsuk

2017-04-01

This paper describes continuously phase-modulated standing surface acoustic waves (CPM-SSAW) and its application for particle separation in multiple pressure nodes. A linear change of phase in CPM-SSAW applies a force to particles whose magnitude depends on their size and contrast factors. During continuous phase modulation, we demonstrate that particles with a target dimension are translated in the direction of moving pressure nodes, whereas smaller particles show oscillatory movements. The rate of phase modulation is optimized for separation of target particles from the relationship between mean particle velocity and period of oscillation. The developed technique is applied to separate particles of a target dimension from the particle mixture. Furthermore, we also demonstrate human keratinocyte cells can be separated in the cell and bead mixture. The separation technique is incorporated with a microfluidic channel spanning multiple pressure nodes, which is advantageous over separation in a single pressure node in terms of throughput.
Load Balancing Strategies for Multiphase Flows on Structured Grids

NASA Astrophysics Data System (ADS)

Olshefski, Kristopher; Owkes, Mark

2017-11-01

The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
The Legnaro-Padova distributed Tier-2: challenges and results

NASA Astrophysics Data System (ADS)

Badoer, Simone; Biasotto, Massimo; Costa, Fulvia; Crescente, Alberto; Fantinel, Sergio; Ferrari, Roberto; Gulmini, Michele; Maron, Gaetano; Michelotto, Michele; Sgaravatto, Massimo; Toniolo, Nicola

2014-06-01

The Legnaro-Padova Tier-2 is a computing facility serving the ALICE and CMS LHC experiments. It also supports other High Energy Physics experiments and other virtual organizations of different disciplines, which can opportunistically harness idle resources if available. The unique characteristic of this Tier-2 is its topology: the computational resources are spread in two different sites, about 15 km apart: the INFN Legnaro National Laboratories and the INFN Padova unit, connected through a 10 Gbps network link (it will be soon updated to 20 Gbps). Nevertheless these resources are seamlessly integrated and are exposed as a single computing facility. Despite this intrinsic complexity, the Legnaro-Padova Tier-2 ranks among the best Grid sites for what concerns reliability and availability. The Tier-2 comprises about 190 worker nodes, providing about 26000 HS06 in total. Such computing nodes are managed by the LSF local resource management system, and are accessible using a Grid-based interface implemented through multiple CREAM CE front-ends. dCache, xrootd and Lustre are the storage systems in use at the Tier-2: about 1.5 PB of disk space is available to users in total, through multiple access protocols. A 10 Gbps network link, planned to be doubled in the next months, connects the Tier-2 to WAN. This link is used for the LHC Open Network Environment (LHCONE) and for other general purpose traffic. In this paper we discuss about the experiences at the Legnaro-Padova Tier-2: the problems that had to be addressed, the lessons learned, the implementation choices. We also present the tools used for the daily management operations. These include DOCET, a Java-based webtool designed, implemented and maintained at the Legnaro-Padova Tier-2, and deployed also in other sites, such as the LHC Italian T1. DOCET provides an uniform interface to manage all the information about the physical resources of a computing center. It is also used as documentation repository available to the Tier-2 operations team. Finally we discuss about the foreseen developments of the existing infrastructure. This includes in particular the evolution from a Grid-based resource towards a Cloud-based computing facility.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
An energy efficient distance-aware routing algorithm with multiple mobile sinks for wireless sensor networks.

PubMed

Wang, Jin; Li, Bin; Xia, Feng; Kim, Chang-Seob; Kim, Jeong-Uk

2014-08-18

Traffic patterns in wireless sensor networks (WSNs) usually follow a many-to-one model. Sensor nodes close to static sinks will deplete their limited energy more rapidly than other sensors, since they will have more data to forward during multihop transmission. This will cause network partition, isolated nodes and much shortened network lifetime. Thus, how to balance energy consumption for sensor nodes is an important research issue. In recent years, exploiting sink mobility technology in WSNs has attracted much research attention because it can not only improve energy efficiency, but prolong network lifetime. In this paper, we propose an energy efficient distance-aware routing algorithm with multiple mobile sink for WSNs, where sink nodes will move with a certain speed along the network boundary to collect monitored data. We study the influence of multiple mobile sink nodes on energy consumption and network lifetime, and we mainly focus on the selection of mobile sink node number and the selection of parking positions, as well as their impact on performance metrics above. We can see that both mobile sink node number and the selection of parking position have important influence on network performance. Simulation results show that our proposed routing algorithm has better performance than traditional routing ones in terms of energy consumption.
An Energy Efficient Distance-Aware Routing Algorithm with Multiple Mobile Sinks for Wireless Sensor Networks

PubMed Central

Wang, Jin; Li, Bin; Xia, Feng; Kim, Chang-Seob; Kim, Jeong-Uk

2014-01-01

Traffic patterns in wireless sensor networks (WSNs) usually follow a many-to-one model. Sensor nodes close to static sinks will deplete their limited energy more rapidly than other sensors, since they will have more data to forward during multihop transmission. This will cause network partition, isolated nodes and much shortened network lifetime. Thus, how to balance energy consumption for sensor nodes is an important research issue. In recent years, exploiting sink mobility technology in WSNs has attracted much research attention because it can not only improve energy efficiency, but prolong network lifetime. In this paper, we propose an energy efficient distance-aware routing algorithm with multiple mobile sink for WSNs, where sink nodes will move with a certain speed along the network boundary to collect monitored data. We study the influence of multiple mobile sink nodes on energy consumption and network lifetime, and we mainly focus on the selection of mobile sink node number and the selection of parking positions, as well as their impact on performance metrics above. We can see that both mobile sink node number and the selection of parking position have important influence on network performance. Simulation results show that our proposed routing algorithm has better performance than traditional routing ones in terms of energy consumption. PMID:25196015
Data driven linear algebraic methods for analysis of molecular pathways: application to disease progression in shock/trauma.

PubMed

McGuire, Mary F; Sriram Iyengar, M; Mercer, David W

2012-04-01

Although trauma is the leading cause of death for those below 45years of age, there is a dearth of information about the temporal behavior of the underlying biological mechanisms in those who survive the initial trauma only to later suffer from syndromes such as multiple organ failure. Levels of serum cytokines potentially affect the clinical outcomes of trauma; understanding how cytokine levels modulate intra-cellular signaling pathways can yield insights into molecular mechanisms of disease progression and help to identify targeted therapies. However, developing such analyses is challenging since it necessitates the integration and interpretation of large amounts of heterogeneous, quantitative and qualitative data. Here we present the Pathway Semantics Algorithm (PSA), an algebraic process of node and edge analyses of evoked biological pathways over time for in silico discovery of biomedical hypotheses, using data from a prospective controlled clinical study of the role of cytokines in multiple organ failure (MOF) at a major US trauma center. A matrix algebra approach was used in both the PSA node and PSA edge analyses with different matrix configurations and computations based on the biomedical questions to be examined. In the edge analysis, a percentage measure of crosstalk called XTALK was also developed to assess cross-pathway interference. In the node/molecular analysis of the first 24h from trauma, PSA uncovered seven molecules evoked computationally that differentiated outcomes of MOF or non-MOF (NMOF), of which three molecules had not been previously associated with any shock/trauma syndrome. In the edge/molecular interaction analysis, PSA examined four categories of functional molecular interaction relationships--activation, expression, inhibition, and transcription--and found that the interaction patterns and crosstalk changed over time and outcome. The PSA edge analysis suggests that a diagnosis, prognosis or therapy based on molecular interaction mechanisms may be most effective within a certain time period and for a specific functional relationship. Copyright Â© 2011 Elsevier Inc. All rights reserved.
Data driven linear algebraic methods for analysis of molecular pathways: application to disease progression in shock/trauma

PubMed Central

McGuire, Mary F.; Iyengar, M. Sriram; Mercer, David W.

2012-01-01

Motivation Although trauma is the leading cause of death for those below 45 years of age, there is a dearth of information about the temporal behavior of the underlying biological mechanisms in those who survive the initial trauma only to later suffer from syndromes such as multiple organ failure. Levels of serum cytokines potentially affect the clinical outcomes of trauma; understanding how cytokine levels modulate intra-cellular signaling pathways can yield insights into molecular mechanisms of disease progression and help to identify targeted therapies. However, developing such analyses is challenging since it necessitates the integration and interpretation of large amounts of heterogeneous, quantitative and qualitative data. Here we present the Pathway Semantics Algorithm (PSA), an algebraic process of node and edge analyses of evoked biological pathways over time for in silico discovery of biomedical hypotheses, using data from a prospective controlled clinical study of the role of cytokines in multiple organ failure (MOF) at a major US trauma center. A matrix algebra approach was used in both the PSA node and PSA edge analyses with different matrix configurations and computations based on the biomedical questions to be examined. In the edge analysis, a percentage measure of crosstalk called XTALK was also developed to assess cross-pathway interference. Results In the node/molecular analysis of the first 24 hours from trauma, PSA uncovered 7 molecules evoked computationally that differentiated outcomes of MOF or non-MOF (NMOF), of which 3 molecules had not been previously associated with any shock / trauma syndrome. In the edge/molecular interaction analysis, PSA examined four categories of functional molecular interaction relationships – activation, expression, inhibition, and transcription – and found that the interaction patterns and crosstalk changed over time and outcome. The PSA edge analysis suggests that a diagnosis, prognosis or therapy based on molecular interaction mechanisms may be most effective within a certain time period and for a specific functional relationship. PMID:22200681
Exact and heuristic algorithms for Space Information Flow.

PubMed

Uwitonze, Alfred; Huang, Jiaqing; Ye, Yuanqing; Cheng, Wenqing; Li, Zongpeng

2018-01-01

Space Information Flow (SIF) is a new promising research area that studies network coding in geometric space, such as Euclidean space. The design of algorithms that compute the optimal SIF solutions remains one of the key open problems in SIF. This work proposes the first exact SIF algorithm and a heuristic SIF algorithm that compute min-cost multicast network coding for N (N ≥ 3) given terminal nodes in 2-D Euclidean space. Furthermore, we find that the Butterfly network in Euclidean space is the second example besides the Pentagram network where SIF is strictly better than Euclidean Steiner minimal tree. The exact algorithm design is based on two key techniques: Delaunay triangulation and linear programming. Delaunay triangulation technique helps to find practically good candidate relay nodes, after which a min-cost multicast linear programming model is solved over the terminal nodes and the candidate relay nodes, to compute the optimal multicast network topology, including the optimal relay nodes selected by linear programming from all the candidate relay nodes and the flow rates on the connection links. The heuristic algorithm design is also based on Delaunay triangulation and linear programming techniques. The exact algorithm can achieve the optimal SIF solution with an exponential computational complexity, while the heuristic algorithm can achieve the sub-optimal SIF solution with a polynomial computational complexity. We prove the correctness of the exact SIF algorithm. The simulation results show the effectiveness of the heuristic SIF algorithm.
Dynamic Extension of a Virtualized Cluster by using Cloud Resources

NASA Astrophysics Data System (ADS)

Oberst, Oliver; Hauth, Thomas; Kernert, David; Riedel, Stephan; Quast, Günter

2012-12-01

The specific requirements concerning the software environment within the HEP community constrain the choice of resource providers for the outsourcing of computing infrastructure. The use of virtualization in HPC clusters and in the context of cloud resources is therefore a subject of recent developments in scientific computing. The dynamic virtualization of worker nodes in common batch systems provided by ViBatch serves each user with a dynamically virtualized subset of worker nodes on a local cluster. Now it can be transparently extended by the use of common open source cloud interfaces like OpenNebula or Eucalyptus, launching a subset of the virtual worker nodes within the cloud. This paper demonstrates how a dynamically virtualized computing cluster is combined with cloud resources by attaching remotely started virtual worker nodes to the local batch system.
Performance evaluation of the multiple root node approach to the Rete pattern matcher for production systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Gaudiot, J.-L.

1991-12-31

Much effort has been expanded on special architectures and algorithms dedicated to efficient processing of the pattern matching step of production systems. In this paper, the authors investigate the possible improvement on the Rete pattern matcher for production systems. Inefficiencies in the Rete match algorithm have been identified, based on which they introduce a pattern matcher with multiple root nodes. A complete implementation of the multiple root node-based production system interpreter is presented to investigate its relative algorithmic behavior over the Rete-based Ops5 production system interpreter. Benchmark production system programs are executed (not simulated) on a sequential machine Sun 4/490more » by using both interpreters and various experimental results are presented. Their investigation indicates that the multiple root node-based production system interpreter would give a maximum of up to 6-fold improvement over the Lisp implementation of the Rete-based Ops5 for the match step.« less
Methods, apparatus and system for selective duplication of subtasks

DOEpatents

Andrade Costa, Carlos H.; Cher, Chen-Yong; Park, Yoonho; Rosenburg, Bryan S.; Ryu, Kyung D.

2016-03-29

A method for selective duplication of subtasks in a high-performance computing system includes: monitoring a health status of one or more nodes in a high-performance computing system, where one or more subtasks of a parallel task execute on the one or more nodes; identifying one or more nodes as having a likelihood of failure which exceeds a first prescribed threshold; selectively duplicating the one or more subtasks that execute on the one or more nodes having a likelihood of failure which exceeds the first prescribed threshold; and notifying a messaging library that one or more subtasks were duplicated.
Informing Urban Decision Making with an Array of Things

NASA Astrophysics Data System (ADS)

Jacob, R. L.; Catlett, C.; Beckman, P. H.; Sankaran, R.

2015-12-01

Over the next several decades, the population of the world's cities is projected to nearly double, increasing by 2.6 billion people and requiring massive urban expansion globally. This massive growth in urban density and scale will compound ongoing city challenges related to climate change, energy, infrastructure, public health, and more. Cities are using data they already collect such as 311 calls, bus and train operations, street repair orders, census data and building permits to help understand the complex interactions between the human, built and natural systems within a city and inform their decision making. Helping to guide urban decision-making is The Array of Things (AoT): a new tool for measuring many aspects of the physical environment of urban areas at the city block scale with continuous, reliable, integrated data from a variety of sensors. An AoT node includes multiple sensors to measure basic meteorological quantities such as pressure, temperature and humidity as well as light and trace gases such as carbon monoxide, nitrogen dioxide, sulfur dioxide and ozone. The sensors operate 24/7 with ingest frequencies as high as 1Hz. The nodes are modular and allow new sensors to be added or swapped out. The hardware/software backbone of an AoT node is provided by the Waggle architecture. Each AoT node includes, via Waggle, compute power from a single board computer running Linux that allows data to be processed in-situ and, if needed, command and control of components of the node. Data is communicated in near real-time typically through WiFi, 3G or wired ethernet to a designated host and resilience is built-in to prevent data loss if communication is disrupted. The AoT includes a software stack with a programmable API and cloud-based infrastructure for performing data ingest and further analysis. The first full instance of AoT will comprise 500 nodes deployed in the City of Chicago, each with power, Internet, and a base set of sensing and embedded information systems capabilities. A prototype of the Array of Things consisting of 12 nodes has been deployed on the campus of the University of Chicago and initial data from the array will be presented.
Runtime optimization of an application executing on a parallel computer

DOEpatents

None

2014-11-25

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer

DOEpatents

Faraj, Daniel A; Smith, Brian E

2014-11-18

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.

Runtime optimization of an application executing on a parallel computer

DOEpatents

Faraj, Daniel A.; Smith, Brian E.

2013-01-29

Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
A computer method for schedule processing and quick-time updating.

NASA Technical Reports Server (NTRS)

Mccoy, W. H.

1972-01-01

A schedule analysis program is presented which can be used to process any schedule with continuous flow and with no loops. Although generally thought of as a management tool, it has applicability to such extremes as music composition and computer program efficiency analysis. Other possibilities for its use include the determination of electrical power usage during some operation such as spacecraft checkout, and the determination of impact envelopes for the purpose of scheduling payloads in launch processing. At the core of the described computer method is an algorithm which computes the position of each activity bar on the output waterfall chart. The algorithm is basically a maximal-path computation which gives to each node in the schedule network the maximal path from the initial node to the given node.
The network level reproduction number for infectious diseases with both vertical and horizontal transmission.

PubMed

Xue, Ling; Scoglio, Caterina

2013-05-01

A wide range of infectious diseases are both vertically and horizontally transmitted. Such diseases are spatially transmitted via multiple species in heterogeneous environments, typically described by complex meta-population models. The reproduction number, R0, is a critical metric predicting whether the disease can invade the meta-population system. This paper presents the reproduction number for a generic disease vertically and horizontally transmitted among multiple species in heterogeneous networks, where nodes are locations, and links reflect outgoing or incoming movement flows. The metapopulation model for vertically and horizontally transmitted diseases is gradually formulated from two species, two-node network models. We derived an explicit expression of R0, which is the spectral radius of a matrix reduced in size with respect to the original next generation matrix. The reproduction number is shown to be a function of vertical and horizontal transmission parameters, and the lower bound is the reproduction number for horizontal transmission. As an application, the reproduction number and its bounds for the Rift Valley fever zoonosis, where livestock, mosquitoes, and humans are the involved species are derived. By computing the reproduction number for different scenarios through numerical simulations, we found the reproduction number is affected by livestock movement rates only when parameters are heterogeneous across nodes. To summarize, our study contributes the reproduction number for vertically and horizontally transmitted diseases in heterogeneous networks. This explicit expression is easily adaptable to specific infectious diseases, affording insights into disease evolution. Copyright © 2013 Elsevier Inc. All rights reserved.
Requesting Different Nodes Types When Submitting Jobs on the Peregrine

Science.gov Websites

System | High-Performance Computing | NREL Requesting Different Nodes Types When Submitting Jobs on the Peregrine System Requesting Different Nodes Types When Submitting Jobs on the Peregrine
Toward a Dynamically Reconfigurable Computing and Communication System for Small Spacecraft

NASA Technical Reports Server (NTRS)

Kifle, Muli; Andro, Monty; Tran, Quang K.; Fujikawa, Gene; Chu, Pong P.

2003-01-01

Future science missions will require the use of multiple spacecraft with multiple sensor nodes autonomously responding and adapting to a dynamically changing space environment. The acquisition of random scientific events will require rapidly changing network topologies, distributed processing power, and a dynamic resource management strategy. Optimum utilization and configuration of spacecraft communications and navigation resources will be critical in meeting the demand of these stringent mission requirements. There are two important trends to follow with respect to NASA's (National Aeronautics and Space Administration) future scientific missions: the use of multiple satellite systems and the development of an integrated space communications network. Reconfigurable computing and communication systems may enable versatile adaptation of a spacecraft system's resources by dynamic allocation of the processor hardware to perform new operations or to maintain functionality due to malfunctions or hardware faults. Advancements in FPGA (Field Programmable Gate Array) technology make it possible to incorporate major communication and network functionalities in FPGA chips and provide the basis for a dynamically reconfigurable communication system. Advantages of higher computation speeds and accuracy are envisioned with tremendous hardware flexibility to ensure maximum survivability of future science mission spacecraft. This paper discusses the requirements, enabling technologies, and challenges associated with dynamically reconfigurable space communications systems.
Locating multiple diffusion sources in time varying networks from sparse observations.

PubMed

Hu, Zhao-Long; Shen, Zhesi; Cao, Shinan; Podobnik, Boris; Yang, Huijie; Wang, Wen-Xu; Lai, Ying-Cheng

2018-02-08

Data based source localization in complex networks has a broad range of applications. Despite recent progress, locating multiple diffusion sources in time varying networks remains to be an outstanding problem. Bridging structural observability and sparse signal reconstruction theories, we develop a general framework to locate diffusion sources in time varying networks based solely on sparse data from a small set of messenger nodes. A general finding is that large degree nodes produce more valuable information than small degree nodes, a result that contrasts that for static networks. Choosing large degree nodes as the messengers, we find that sparse observations from a few such nodes are often sufficient for any number of diffusion sources to be located for a variety of model and empirical networks. Counterintuitively, sources in more rapidly varying networks can be identified more readily with fewer required messenger nodes.
The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster

NASA Astrophysics Data System (ADS)

Löwe, P.; Klump, J.; Thaler, J.

2012-04-01

Compute clusters can be used as GIS workbenches, their wealth of resources allow us to take on geocomputation tasks which exceed the limitations of smaller systems. To harness these capabilities requires a Geographic Information System (GIS), able to utilize the available cluster configuration/architecture and a sufficient degree of user friendliness to allow for wide application. In this paper we report on the first successful porting of GRASS GIS, the oldest and largest Free Open Source (FOSS) GIS project, onto a compute cluster using Platform Computing's Load Sharing Facility (LSF). In 2008, GRASS6.3 was installed on the GFZ compute cluster, which at that time comprised 32 nodes. The interaction with the GIS was limited to the command line interface, which required further development to encapsulate the GRASS GIS business layer to facilitate its use by users not familiar with GRASS GIS. During the summer of 2011, multiple versions of GRASS GIS (v 6.4, 6.5 and 7.0) were installed on the upgraded GFZ compute cluster, now consisting of 234 nodes with 480 CPUs providing 3084 cores. The GFZ compute cluster currently offers 19 different processing queues with varying hardware capabilities and priorities, allowing for fine-grained scheduling and load balancing. After successful testing of core GIS functionalities, including the graphical user interface, mechanisms were developed to deploy scripted geocomputation tasks onto dedicated processing queues. The mechanisms are based on earlier work by NETELER et al. (2008). A first application of the new GIS functionality was the generation of maps of simulated tsunamis in the Mediterranean Sea for the Tsunami Atlas of the FP-7 TRIDEC Project (www.tridec-online.eu). For this, up to 500 processing nodes were used in parallel. Further trials included the processing of geometrically complex problems, requiring significant amounts of processing time. The GIS cluster successfully completed all these tasks, with processing times lasting up to full 20 CPU days. The deployment of GRASS GIS on a compute cluster allows our users to tackle GIS tasks previously out of reach of single workstations. In addition, this GRASS GIS cluster implementation will be made available to other users at GFZ in the course of 2012. It will thus become a research utility in the sense of "Software as a Service" (SaaS) and can be seen as our first step towards building a GFZ corporate cloud service.
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.

PubMed

Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

2016-01-01

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Solving multiconstraint assignment problems using learning automata.

PubMed

Horn, Geir; Oommen, B John

2010-02-01

This paper considers the NP-hard problem of object assignment with respect to multiple constraints: assigning a set of elements (or objects) into mutually exclusive classes (or groups), where the elements which are "similar" to each other are hopefully located in the same class. The literature reports solutions in which the similarity constraint consists of a single index that is inappropriate for the type of multiconstraint problems considered here and where the constraints could simultaneously be contradictory. This feature, where we permit possibly contradictory constraints, distinguishes this paper from the state of the art. Indeed, we are aware of no learning automata (or other heuristic) solutions which solve this problem in its most general setting. Such a scenario is illustrated with the static mapping problem, which consists of distributing the processes of a parallel application onto a set of computing nodes. This is a classical and yet very important problem within the areas of parallel computing, grid computing, and cloud computing. We have developed four learning-automata (LA)-based algorithms to solve this problem: First, a fixed-structure stochastic automata algorithm is presented, where the processes try to form pairs to go onto the same node. This algorithm solves the problem, although it requires some centralized coordination. As it is desirable to avoid centralized control, we subsequently present three different variable-structure stochastic automata (VSSA) algorithms, which have superior partitioning properties in certain settings, although they forfeit some of the scalability features of the fixed-structure algorithm. All three VSSA algorithms model the processes as automata having first the hosting nodes as possible actions; second, the processes as possible actions; and, third, attempting to estimate the process communication digraph prior to probabilistically mapping the processes. This paper, which, we believe, comprehensively reports the pioneering LA solutions to this problem, unequivocally demonstrates that LA can play an important role in solving complex combinatorial and integer optimization problems.
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment

PubMed Central

Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

2016-01-01

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
A study on the value of computer-assisted assessment for SPECT/CT-scans in sentinel lymph node diagnostics of penile cancer as well as clinical reliability and morbidity of this procedure.

PubMed

Lützen, Ulf; Naumann, Carsten Maik; Marx, Marlies; Zhao, Yi; Jüptner, Michael; Baumann, René; Papp, László; Zsótér, Norbert; Aksenov, Alexey; Jünemann, Klaus-Peter; Zuhayra, Maaz

2016-09-07

Because of the increasing importance of computer-assisted post processing of image data in modern medical diagnostic we studied the value of an algorithm for assessment of single photon emission computed tomography/computed tomography (SPECT/CT)-data, which has been used for the first time for lymph node staging in penile cancer with non-palpable inguinal lymph nodes. In the guidelines of the relevant international expert societies, sentinel lymph node-biopsy (SLNB) is recommended as a diagnostic method of choice. The aim of this study is to evaluate the value of the afore-mentioned algorithm and in the clinical context the reliability and the associated morbidity of this procedure. Between 2008 and 2015, 25 patients with invasive penile cancer and inconspicuous inguinal lymph node status underwent SLNB after application of the radiotracer Tc-99m labelled nanocolloid. We recorded in a prospective approach the reliability and the complication rate of the procedure. In addition, we evaluated the results of an algorithm for SPECT/CT-data assessment of these patients. SLNB was carried out in 44 groins of 25 patients. In three patients, inguinal lymph node metastases were detected via SLNB. In one patient, bilateral lymph node recurrence of the groins occurred after negative SLNB. There was a false-negative rate of 4 % in relation to the number of patients (1/25), resp. 4.5 % in relation to the number of groins (2/44). Morbidity was 4 % in relation to the number of patients (1/25), resp. 2.3 % in relation to the number of groins (1/44). The results of computer-assisted assessment of SPECT/CT data for sentinel lymph node (SLN)-diagnostics demonstrated high sensitivity of 88.8 % and specificity of 86.7 %. SLNB is a very reliable method, associated with low morbidity. Computer-assisted assessment of SPECT/CT data of the SLN-diagnostics shows high sensitivity and specificity. While it cannot replace the assessment by medical experts, it can still provide substantial supplement and assistance.
Clinicopathologic risk factors for right paraesophageal lymph node metastasis in patients with papillary thyroid carcinoma.

PubMed

Yu, Q A; Ma, D K; Liu, K P; Wang, P; Xie, C M; Wu, Y H; Dai, W J; Jiang, H C

2018-03-17

To investigate risk factors associated with right paraesophageal lymph node (RPELN) metastasis in patients with papillary thyroid carcinoma (PTC) and to determine the indications for right lymph node dissection. Clinicopathologic data from 829 patients (104 men and 725 women) with PTC, operated on by the same thyroid surgery team at the First Affiliated Hospital of Harbin Medical University from January 2013 to May 2017, were analyzed. Overall, 309 patients underwent total thyroidectomy with bilateral lymph node dissection, 488 underwent right thyroid lobe and isthmic resection with right central compartment lymph node dissection, and 32 underwent near-total thyroidectomy (ipsilateral thyroid lobectomy with contralateral near-total lobectomy) with bilateral lymph node dissection. The overall rate of central compartment lymph node metastasis was 43.5% (361/829), with right central compartment lymph node and RPELN metastasis rates of 35.5% (294/829) and 19.1% (158/829), respectively. Tumor size, number, invasion, and location, lymph node metastasis, right central compartment lymph node metastasis, and right lateral compartment lymph node metastasis were associated with RPELN in the univariate analysis, whereas age and sex were not. Multivariate analysis identified tumors with a diameter ≥ 1 cm, multiple tumors, tumors located in the right lobe, right central compartment lymph node metastasis, and right lateral compartment lymph node metastasis as independent risk factors for RPELN metastasis. Lymph node dissection, including RPELN dissection, should be performed for patients with PTC with a tumor diameter ≥ 1 cm, multiple tumors, right-lobe tumors, right central compartment lymph node metastasis, or suspected lateral compartment lymph node metastasis.
Laboratory for Computer Science Progress Report 16, 1 July 1978 - 30 June 1979,

DTIC Science & Technology

1980-08-01

name strongly distinguishes the XLMS node from ordinary nameless semantic network nodes. The name of a node has two parts: the " genus ", itself a node...and the "specializer", a node or an atomic symbol. The genus and specializer of a node are almost always semantically meaningful, though their...meaning is almost never suppliec by XLMS, but rather by some system built on top of XLMS. The genus of a node almost always plays a crucial role in its
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support

PubMed Central

Camargo, João; Rochol, Juergen; Gerla, Mario

2018-01-01

A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends. PMID:29364172
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support.

PubMed

Rosário, Denis; Schimuneck, Matias; Camargo, João; Nobre, Jéferson; Both, Cristiano; Rochol, Juergen; Gerla, Mario

2018-01-24

A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends.
Exploiting multicore compute resources in the CMS experiment

NASA Astrophysics Data System (ADS)

Ramírez, J. E.; Pérez-Calero Yzquierdo, A.; Hernández, J. M.; CMS Collaboration

2016-10-01

CMS has developed a strategy to efficiently exploit the multicore architecture of the compute resources accessible to the experiment. A coherent use of the multiple cores available in a compute node yields substantial gains in terms of resource utilization. The implemented approach makes use of the multithreading support of the event processing framework and the multicore scheduling capabilities of the resource provisioning system. Multicore slots are acquired and provisioned by means of multicore pilot agents which internally schedule and execute single and multicore payloads. Multicore scheduling and multithreaded processing are currently used in production for online event selection and prompt data reconstruction. More workflows are being adapted to run in multicore mode. This paper presents a review of the experience gained in the deployment and operation of the multicore scheduling and processing system, the current status and future plans.
Recent Performance Results of VPIC on Trinity

NASA Astrophysics Data System (ADS)

Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Le, A.; Li, H.; Nam, H.; Pang, X.; Stark, D. J.; Rust, W. N., III; Yin, L.; Albright, B. J.

2017-10-01

Trinity is a new DOE compute resource now in production at Los Alamos National Laboratory. Trinity has several new and unique features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes, use of on package high bandwidth memory (HBM) for KNL nodes, ability to configure KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to optimize VPIC on Trinity by taking advantage of these new architectural features. Results of work will be presented on performance of VPIC on Haswell and KNL partitions for single node runs and runs at scale. Results include use of burst buffers at scale to optimize I/O, comparison of strategies for using MPI and threads, performance benefits using HBM and effectiveness of using intrinsics for vectorization. Work performed under auspices of U.S. Dept. of Energy by Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by LANL LDRD program.
Calibrated tree priors for relaxed phylogenetics and divergence time estimation.

PubMed

Heled, Joseph; Drummond, Alexei J

2012-01-01

The use of fossil evidence to calibrate divergence time estimation has a long history. More recently, Bayesian Markov chain Monte Carlo has become the dominant method of divergence time estimation, and fossil evidence has been reinterpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties of calibrated tree priors in a Bayesian setting hashave not been carefully investigated. Here, we clarify that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node. We illustrate this with a number of analytical results on small trees. We also describe an alternative construction for a calibrated Yule prior on trees that allows direct specification of the marginal prior distribution of the calibrated divergence time, with or without the restriction of monophyly. This method requires the computation of the Yule prior conditional on the height of the divergence being calibrated. Unfortunately, a practical solution for multiple calibrations remains elusive. Our results suggest that direct estimation of the prior induced by specifying multiple calibration densities should be a prerequisite of any divergence time dating analysis.
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

PubMed

González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

2016-12-15

MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Low-cost embedded systems for democratizing ocean sensor technology in the coastal zone

NASA Astrophysics Data System (ADS)

Glazer, B. T.; Lio, H. I.

2017-12-01

Environmental sciences suffer from undersampling. Enabling sustained and unattended data collection in the coastal zone typically involves expensive instrumentation and infrastructure deployed as cabled observatories or moorings with little flexibility in deployment location following initial installation. High costs of commercially-available or custom instruments have limited the number of sensor sites that can be targeted by academic researchers, and have also limited engagement with the public. We have developed a novel, low-cost, open-source sensor and software platform to enable wireless data transfer of biogeochemical sensors in the coastal zone. The platform is centered upon widely available, low-cost, single board computers and microcontrollers. We have used a blend of on-hand research-grade sensors and low-cost open-source electronics that can be assembled by tech-savvy non-engineers. Robust, open-source code that remains customizable for specific miniNode configurations can match a specific site's measurement needs, depending on the scientific research priorities. We have demonstrated prototype capabilities and versatility through lab testing and field deployments of multiple sensor nodes with multiple sensor inputs, all of which are streaming near-real-time data from Kaneohe Bay over wireless RF links to a shore-based base station.

Efficiently sphere-decodable physical layer transmission schemes for wireless storage networks

NASA Astrophysics Data System (ADS)

Lu, Hsiao-Feng Francis; Barreal, Amaro; Karpuk, David; Hollanti, Camilla

2016-12-01

Three transmission schemes over a new type of multiple-access channel (MAC) model with inter-source communication links are proposed and investigated in this paper. This new channel model is well motivated by, e.g., wireless distributed storage networks, where communication to repair a lost node takes place from helper nodes to a repairing node over a wireless channel. Since in many wireless networks nodes can come and go in an arbitrary manner, there must be an inherent capability of inter-node communication between every pair of nodes. Assuming that communication is possible between every pair of helper nodes, the newly proposed schemes are based on various smart time-sharing and relaying strategies. In other words, certain helper nodes will be regarded as relays, thereby converting the conventional uncooperative multiple-access channel to a multiple-access relay channel (MARC). The diversity-multiplexing gain tradeoff (DMT) of the system together with efficient sphere-decodability and low structural complexity in terms of the number of antennas required at each end is used as the main design objectives. While the optimal DMT for the new channel model is fully open, it is shown that the proposed schemes outperform the DMT of the simple time-sharing protocol and, in some cases, even the optimal uncooperative MAC DMT. While using a wireless distributed storage network as a motivating example throughout the paper, the MAC transmission techniques proposed here are completely general and as such applicable to any MAC communication with inter-source communication links.
Direct memory access transfer completion notification

DOEpatents

Chen, Dong; Giampapa, Mark E.; Heidelberger, Philip; Kumar, Sameer; Parker, Jeffrey J.; Steinmacher-Burow, Burkhard D.; Vranas, Pavlos

2010-07-27

Methods, compute nodes, and computer program products are provided for direct memory access (`DMA`) transfer completion notification. Embodiments include determining, by an origin DMA engine on an origin compute node, whether a data descriptor for an application message to be sent to a target compute node is currently in an injection first-in-first-out (`FIFO`) buffer in dependence upon a sequence number previously associated with the data descriptor, the total number of descriptors currently in the injection FIFO buffer, and the current sequence number for the newest data descriptor stored in the injection FIFO buffer; and notifying a processor core on the origin DMA engine that the message has been sent if the data descriptor for the message is not currently in the injection FIFO buffer.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment

NASA Astrophysics Data System (ADS)

Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.

2013-12-01

Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
A Real-Time Executive for Multiple-Computer Clusters.

DTIC Science & Technology

1984-12-01

in a real-time environment is tantamount to speed and efficiency. By effectively co-locating real-time sensors and related processing modules, real...of which there are two ki n1 s : multicast group address - virtually any nur.,ber of node groups can be assigned a group address so they are all able...interfaceloopbark by 󈧅’b4, internal _loopback by 02"b4, clear loooback by 󈧇’b4, go offline by Ŝ"b4, eo online by 󈧍’b4, onboard _diagnostic by Oa’b4, cdr
Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce

PubMed Central

Pratx, Guillem; Xing, Lei

2011-01-01

Monte Carlo simulation is considered the most reliable method for modeling photon migration in heterogeneous media. However, its widespread use is hindered by the high computational cost. The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment. We ported the MC321 Monte Carlo package to Hadoop, an open-source MapReduce framework. In this implementation, Map tasks compute photon histories in parallel while a Reduce task scores photon absorption. The distributed implementation was evaluated on a commercial compute cloud. The simulation time was found to be linearly dependent on the number of photons and inversely proportional to the number of nodes. For a cluster size of 240 nodes, the simulation of 100 billion photon histories took 22 min, a 1258 × speed-up compared to the single-threaded Monte Carlo program. The overall computational throughput was 85,178 photon histories per node per second, with a latency of 100 s. The distributed simulation produced the same output as the original implementation and was resilient to hardware failure: the correctness of the simulation was unaffected by the shutdown of 50% of the nodes. PMID:22191916
Maximal Neighbor Similarity Reveals Real Communities in Networks

PubMed Central

Žalik, Krista Rizman

2015-01-01

An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence. PMID:26680448
Dispatching packets on a global combining network of a parallel computer

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN

2011-07-19

Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
Detecting Distributed SQL Injection Attacks in a Eucalyptus Cloud Environment

NASA Technical Reports Server (NTRS)

Kebert, Alan; Barnejee, Bikramjit; Solano, Juan; Solano, Wanda

2013-01-01

The cloud computing environment offers malicious users the ability to spawn multiple instances of cloud nodes that are similar to virtual machines, except that they can have separate external IP addresses. In this paper we demonstrate how this ability can be exploited by an attacker to distribute his/her attack, in particular SQL injection attacks, in such a way that an intrusion detection system (IDS) could fail to identify this attack. To demonstrate this, we set up a small private cloud, established a vulnerable website in one instance, and placed an IDS within the cloud to monitor the network traffic. We found that an attacker could quite easily defeat the IDS by periodically altering its IP address. To detect such an attacker, we propose to use multi-agent plan recognition, where the multiple source IPs are considered as different agents who are mounting a collaborative attack. We show that such a formulation of this problem yields a more sophisticated approach to detecting SQL injection attacks within a cloud computing environment.
Quality of service routing in wireless ad hoc networks

NASA Astrophysics Data System (ADS)

Sane, Sachin J.; Patcha, Animesh; Mishra, Amitabh

2003-08-01

An efficient routing protocol is essential to guarantee application level quality of service running on wireless ad hoc networks. In this paper we propose a novel routing algorithm that computes a path between a source and a destination by considering several important constraints such as path-life, availability of sufficient energy as well as buffer space in each of the nodes on the path between the source and destination. The algorithm chooses the best path from among the multiples paths that it computes between two endpoints. We consider the use of control packets that run at a priority higher than the data packets in determining the multiple paths. The paper also examines the impact of different schedulers such as weighted fair queuing, and weighted random early detection among others in preserving the QoS level guarantees. Our extensive simulation results indicate that the algorithm improves the overall lifetime of a network, reduces the number of dropped packets, and decreases the end-to-end delay for real-time voice application.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure

NASA Astrophysics Data System (ADS)

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-01

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Peregrine System | High-Performance Computing | NREL

Science.gov Websites

) and longer-term (/projects) storage. These file systems are mounted on all nodes. Peregrine has three -2670 Xeon processors and 64 GB of memory. In addition to mounting the /home, /nopt, /projects and # cores/node Memory/node Peak (DP) performance per node 88 Intel Xeon E5-2670 "Sandy Bridge" 8
Clock Agreement Among Parallel Supercomputer Nodes

DOE Data Explorer

Jones, Terry R.; Koenig, Gregory A.

2014-04-30

This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
A spread willingness computing-based information dissemination model.

PubMed

Huang, Haojing; Cui, Zhiming; Zhang, Shukui

2014-01-01

This paper constructs a kind of spread willingness computing based on information dissemination model for social network. The model takes into account the impact of node degree and dissemination mechanism, combined with the complex network theory and dynamics of infectious diseases, and further establishes the dynamical evolution equations. Equations characterize the evolutionary relationship between different types of nodes with time. The spread willingness computing contains three factors which have impact on user's spread behavior: strength of the relationship between the nodes, views identity, and frequency of contact. Simulation results show that different degrees of nodes show the same trend in the network, and even if the degree of node is very small, there is likelihood of a large area of information dissemination. The weaker the relationship between nodes, the higher probability of views selection and the higher the frequency of contact with information so that information spreads rapidly and leads to a wide range of dissemination. As the dissemination probability and immune probability change, the speed of information dissemination is also changing accordingly. The studies meet social networking features and can help to master the behavior of users and understand and analyze characteristics of information dissemination in social network.
A Spread Willingness Computing-Based Information Dissemination Model

PubMed Central

Cui, Zhiming; Zhang, Shukui

2014-01-01

This paper constructs a kind of spread willingness computing based on information dissemination model for social network. The model takes into account the impact of node degree and dissemination mechanism, combined with the complex network theory and dynamics of infectious diseases, and further establishes the dynamical evolution equations. Equations characterize the evolutionary relationship between different types of nodes with time. The spread willingness computing contains three factors which have impact on user's spread behavior: strength of the relationship between the nodes, views identity, and frequency of contact. Simulation results show that different degrees of nodes show the same trend in the network, and even if the degree of node is very small, there is likelihood of a large area of information dissemination. The weaker the relationship between nodes, the higher probability of views selection and the higher the frequency of contact with information so that information spreads rapidly and leads to a wide range of dissemination. As the dissemination probability and immune probability change, the speed of information dissemination is also changing accordingly. The studies meet social networking features and can help to master the behavior of users and understand and analyze characteristics of information dissemination in social network. PMID:25110738
A Field Programmable Gate Array-Based Reconfigurable Smart-Sensor Network for Wireless Monitoring of New Generation Computer Numerically Controlled Machines

PubMed Central

Moreno-Tapia, Sandra Veronica; Vera-Salas, Luis Alberto; Osornio-Rios, Roque Alfredo; Dominguez-Gonzalez, Aurelio; Stiharu, Ion; de Jesus Romero-Troncoso, Rene

2010-01-01

Computer numerically controlled (CNC) machines have evolved to adapt to increasing technological and industrial requirements. To cover these needs, new generation machines have to perform monitoring strategies by incorporating multiple sensors. Since in most of applications the online Processing of the variables is essential, the use of smart sensors is necessary. The contribution of this work is the development of a wireless network platform of reconfigurable smart sensors for CNC machine applications complying with the measurement requirements of new generation CNC machines. Four different smart sensors are put under test in the network and their corresponding signal processing techniques are implemented in a Field Programmable Gate Array (FPGA)-based sensor node. PMID:22163602
A field programmable gate array-based reconfigurable smart-sensor network for wireless monitoring of new generation computer numerically controlled machines.

PubMed

Moreno-Tapia, Sandra Veronica; Vera-Salas, Luis Alberto; Osornio-Rios, Roque Alfredo; Dominguez-Gonzalez, Aurelio; Stiharu, Ion; Romero-Troncoso, Rene de Jesus

2010-01-01

Computer numerically controlled (CNC) machines have evolved to adapt to increasing technological and industrial requirements. To cover these needs, new generation machines have to perform monitoring strategies by incorporating multiple sensors. Since in most of applications the online Processing of the variables is essential, the use of smart sensors is necessary. The contribution of this work is the development of a wireless network platform of reconfigurable smart sensors for CNC machine applications complying with the measurement requirements of new generation CNC machines. Four different smart sensors are put under test in the network and their corresponding signal processing techniques are implemented in a Field Programmable Gate Array (FPGA)-based sensor node.
Shared address collectives using counter mechanisms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael; Dozsa, Gabor; Gooding, Thomas M

A shared address space on a compute node stores data received from a network and data to transmit to the network. The shared address space includes an application buffer that can be directly operated upon by a plurality of processes, for instance, running on different cores on the compute node. A shared counter is used for one or more of signaling arrival of the data across the plurality of processes running on the compute node, signaling completion of an operation performed by one or more of the plurality of processes, obtaining reservation slots by one or more of the pluralitymore » of processes, or combinations thereof.« less
Addressing the computational cost of large EIT solutions.

PubMed

Boyle, Alistair; Borsic, Andrea; Adler, Andy

2012-05-01

Electrical impedance tomography (EIT) is a soft field tomography modality based on the application of electric current to a body and measurement of voltages through electrodes at the boundary. The interior conductivity is reconstructed on a discrete representation of the domain using a finite-element method (FEM) mesh and a parametrization of that domain. The reconstruction requires a sequence of numerically intensive calculations. There is strong interest in reducing the cost of these calculations. An improvement in the compute time for current problems would encourage further exploration of computationally challenging problems such as the incorporation of time series data, wide-spread adoption of three-dimensional simulations and correlation of other modalities such as CT and ultrasound. Multicore processors offer an opportunity to reduce EIT computation times but may require some restructuring of the underlying algorithms to maximize the use of available resources. This work profiles two EIT software packages (EIDORS and NDRM) to experimentally determine where the computational costs arise in EIT as problems scale. Sparse matrix solvers, a key component for the FEM forward problem and sensitivity estimates in the inverse problem, are shown to take a considerable portion of the total compute time in these packages. A sparse matrix solver performance measurement tool, Meagre-Crowd, is developed to interface with a variety of solvers and compare their performance over a range of two- and three-dimensional problems of increasing node density. Results show that distributed sparse matrix solvers that operate on multiple cores are advantageous up to a limit that increases as the node density increases. We recommend a selection procedure to find a solver and hardware arrangement matched to the problem and provide guidance and tools to perform that selection.
Accurate evaluation of axillary sentinel lymph node metastasis using contrast-enhanced ultrasonography with Sonazoid in breast cancer: a preliminary clinical trial.

PubMed

Matsuzawa, Fumihiko; Omoto, Kiyoka; Einama, Takahiro; Abe, Hironori; Suzuki, Takashi; Hamaguchi, Jun; Kaga, Terumi; Sato, Mami; Oomura, Masako; Takata, Yumiko; Fujibe, Ayako; Takeda, Chie; Tamura, Etsuya; Taketomi, Akinobu; Kyuno, Kenichi

2015-01-01

Breast cancer is the most common type of cancer in women. The 5-year survival rate in patients with breast cancer ranges from 74 to 82 %. Sentinel lymph node biopsy has become an alternative to axillary lymph node dissection for nodal staging. We evaluated the detection of the sentinel lymph node and metastasis of the lymph node using contrast enhanced ultrasonography with Sonazoid. Between December 2013 and May 2014, 32 patients with operable breast cancer were enrolled in this study. We evaluated the detection of axillary sentinel lymph nodes and the evaluation of axillary lymph nodes metastasis using contrast enhanced computed tomography, color Doppler ultrasonography and contrast enhanced ultrasonography with Sonazoid. All the sentinel lymph nodes were identified, and the sentinel lymph nodes detected by contrast enhanced ultrasonography with Sonazoid corresponded with those detected by computed tomography lymphography and indigo carmine method. The detection of metastasis based on contrast enhanced computed tomography were sensitivity 20.0 %, specificity 88.2 %, PPV 60.0 %, NPV 55.6 %, accuracy 56.3 %. Based on color Doppler ultrasonography, the results were sensitivity 36.4 %, specificity 95.2 %, PPV 80.0 %, NPV 74.1 %, accuracy 75.0 %. Based on contrast enhanced ultrasonography with Sonazoid, the results were sensitivity 81.8 %, specificity 95.2 %, PPV 90.0 %, NPV 90.9 %, accuracy 90.6 %. The results suggested that contrast enhanced ultrasonography with Sonazoid was the most accurate among the evaluations of these modalities. In the future, we believe that our method would take the place of conventional sentinel lymph node biopsy for an axillary staging method.
The Mark III Hypercube-Ensemble Computers

NASA Technical Reports Server (NTRS)

Peterson, John C.; Tuazon, Jesus O.; Lieberman, Don; Pniel, Moshe

1988-01-01

Mark III Hypercube concept applied in development of series of increasingly powerful computers. Processor of each node of Mark III Hypercube ensemble is specialized computer containing three subprocessors and shared main memory. Solves problem quickly by simultaneously processing part of problem at each such node and passing combined results to host computer. Disciplines benefitting from speed and memory capacity include astrophysics, geophysics, chemistry, weather, high-energy physics, applied mechanics, image processing, oil exploration, aircraft design, and microcircuit design.

Solving a Hamiltonian Path Problem with a bacterial computer

PubMed Central

Baumgardner, Jordan; Acker, Karen; Adefuye, Oyinade; Crowley, Samuel Thomas; DeLoache, Will; Dickson, James O; Heard, Lane; Martens, Andrew T; Morton, Nickolaus; Ritter, Michelle; Shoecraft, Amber; Treece, Jessica; Unzicker, Matthew; Valencia, Amanda; Waters, Mike; Campbell, A Malcolm; Heyer, Laurie J; Poet, Jeffrey L; Eckdahl, Todd T

2009-01-01

Background The Hamiltonian Path Problem asks whether there is a route in a directed graph from a beginning node to an ending node, visiting each node exactly once. The Hamiltonian Path Problem is NP complete, achieving surprising computational complexity with modest increases in size. This challenge has inspired researchers to broaden the definition of a computer. DNA computers have been developed that solve NP complete problems. Bacterial computers can be programmed by constructing genetic circuits to execute an algorithm that is responsive to the environment and whose result can be observed. Each bacterium can examine a solution to a mathematical problem and billions of them can explore billions of possible solutions. Bacterial computers can be automated, made responsive to selection, and reproduce themselves so that more processing capacity is applied to problems over time. Results We programmed bacteria with a genetic circuit that enables them to evaluate all possible paths in a directed graph in order to find a Hamiltonian path. We encoded a three node directed graph as DNA segments that were autonomously shuffled randomly inside bacteria by a Hin/hixC recombination system we previously adapted from Salmonella typhimurium for use in Escherichia coli. We represented nodes in the graph as linked halves of two different genes encoding red or green fluorescent proteins. Bacterial populations displayed phenotypes that reflected random ordering of edges in the graph. Individual bacterial clones that found a Hamiltonian path reported their success by fluorescing both red and green, resulting in yellow colonies. We used DNA sequencing to verify that the yellow phenotype resulted from genotypes that represented Hamiltonian path solutions, demonstrating that our bacterial computer functioned as expected. Conclusion We successfully designed, constructed, and tested a bacterial computer capable of finding a Hamiltonian path in a three node directed graph. This proof-of-concept experiment demonstrates that bacterial computing is a new way to address NP-complete problems using the inherent advantages of genetic systems. The results of our experiments also validate synthetic biology as a valuable approach to biological engineering. We designed and constructed basic parts, devices, and systems using synthetic biology principles of standardization and abstraction. PMID:19630940
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by semi-randomly varying routing policies for different packets

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-11-23

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Performance of VPIC on Sequoia

NASA Astrophysics Data System (ADS)

Nystrom, William

2014-10-01

Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
Single-node orbit analsyis with radiation heat transfer only

NASA Technical Reports Server (NTRS)

Peoples, J. A.

1977-01-01

The steady-state temperature of a single node which dissipates energy by radiation only is discussed for a nontime varying thermal environment. Relationships are developed to illustrate how shields can be utilized to represent a louver system. A computer program is presented which can assess periodic temperature characteristics of a single node in a time varying thermal environment having energy dissipation by radiation only. The computer program performs thermal orbital analysis for five combinations of plate, shields, and louvers.
Fog computing job scheduling optimization based on bees swarm

NASA Astrophysics Data System (ADS)

Bitam, Salim; Zeadally, Sherali; Mellouk, Abdelhamid

2018-04-01

Fog computing is a new computing architecture, composed of a set of near-user edge devices called fog nodes, which collaborate together in order to perform computational services such as running applications, storing an important amount of data, and transmitting messages. Fog computing extends cloud computing by deploying digital resources at the premise of mobile users. In this new paradigm, management and operating functions, such as job scheduling aim at providing high-performance, cost-effective services requested by mobile users and executed by fog nodes. We propose a new bio-inspired optimization approach called Bees Life Algorithm (BLA) aimed at addressing the job scheduling problem in the fog computing environment. Our proposed approach is based on the optimized distribution of a set of tasks among all the fog computing nodes. The objective is to find an optimal tradeoff between CPU execution time and allocated memory required by fog computing services established by mobile users. Our empirical performance evaluation results demonstrate that the proposal outperforms the traditional particle swarm optimization and genetic algorithm in terms of CPU execution time and allocated memory.
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

NASA Astrophysics Data System (ADS)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the number of files and the elapsed time, parallel and distributed processing shorten the elapsed time to 1/5 than sequential processing. On the other hand, sequential processing times were shortened in another experiment, whose file size is smaller than 100KB. In this case, the elapsed time to scan one file is within one second. It implies that disk swap took place in case of parallel processing by each node. We note that the operation became unstable when the number of the files exceeded 1000. To overcome the problem (iii), we developed an original data class. This class supports our reading of data files with various data formats since it converts them into an original data format since it defines schemata for every type of data and encapsulates the structure of data files. In addition, since this class provides a function of time re-sampling, users can easily convert multiple data (array) with different time resolution into the same time resolution array. Finally, using the Gfarm, we achieved a high performance environment for large-scale statistical data analyses. It should be noted that the present method is effective only when one data file size is large enough. At present, we are restructuring the new Gfarm environment with 8 nodes: CPU is Athlon 64 x2 Dual Core 2GHz, 2GB memory and 1.2TB disk (using RAID0) for each node. Our original class is to be implemented on the new Gfarm environment. In the present talk, we show the latest results with applying the present system for data analyses with huge number of satellite observation data files.
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Christopher H.; Long, Hai; Sides, Scott

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
A Case of Metastatic Basal Cell Carcinoma Treated with Cisplatin and Adriamycin.

PubMed

Kanzaki, Akiko; Ansai, Shin-Ichi; Ueno, Takashi; Kawana, Seiji; Shimizu, Akira; Naito, Zenya; Saeki, Hidehisa

2017-01-01

A 72-year-old man was referred to our hospital for treatment of an ulcer that had been growing on his back for 10 years. Physical examination showed an ulcerated tumor from the neck to the back and swollen cervical lymph nodes. The tumor size was 12×9 cm. Histology of the biopsy showed a nodular and morpheic basal cell carcinoma (BCC). A chest computed tomography (CT) scan showed multiple lung tumors. CT-guided biopsy of the lung and the cervical lymph node revealed metastatic basal cell carcinoma (MBCC). The primary skin tumor was resected and a total of 10 courses of cisplatin (25 mg/m 2 /day×75%) and adriamycin (50 mg/m 2 ×75%) were administered for metastatic basal cell carcinoma (MBCC). The patient died 5 years and 3 months after his first visit. Autopsy revealed MBCC in the lung, kidney, pancreas, several lymph nodes, liver and bone. A portion of the tumor cells were composed of squamoid cells with eosinophilic cytoplasm, large nuclei, lack of the characteristic peripheral palisading and retraction artifacts, and variable cytoplasmic keratinization. These pathological findings were compatible with basosquamous cell carcinoma. Chemotherapy was effective for MBCC in this patient.
[The rule of lymphatic formation in rabbit VX2 supraglottic carcinoma model with lymph node metastasis].

PubMed

Zhang, Pin; Ji, Wenyue; Zhang, Xiangbo

2012-02-01

Establishment of transplanted model of VX2 supraglottic carcinoma in rabbits and investigation the rule of lymphatic vessels formation. After establishment of VX2 tumor-bearing rabbits, the carcinoma tissues were transplanted into the operculum laryngis submucosa in sixty New-Zealand white rabbits to establish transplanted tumor model. Vascular endothelial growth factor-3 (VEGFR-3) label staining was performed to observe lymphatic vessels. Number density, volume density of lymphatics periphery region of carcinoma, normal region and centre region were measured using computer image analysis system. There was no lymphatic vessels in carcinomatous centre region,but the lymphatic vessels number density, volume density in periphery region was much more than normal region. Their cavities were dilated. The discrepancy had statistical significance (P<0.01). The rule of lymphatic formation in rabbit VX2 supraglottic carcinoma model mimesis rule of lymphatic formation anthropo- supraglottic carcinoma. Lymphatic multiplication and dilation at periphery region of carcinoma is associated with lymph node metastasis. Evaluation of it at periphery region of carcinoma may be useful in predicting lymph node metastasis in patients with supraglottic carcinoma. This conclusion provides theoretical basis for utility of the anti-tumor medicines which inhibit lymphatic formation in animal model.
Broadcasting Topology and Routing Information in Computer Networks

DTIC Science & Technology

1985-05-01

DOWN\\ linki inki FIgwre 1.2.1: Topology Problem Example messages from node 2 before receiving the first DOWN message from node 3. Now assume that before...node to each of the link’s end nodes. 54 link.1 cc 4 1 -. distances to linki Figue 3.4.2: SPTA Port Distance Table Example An example of these
Modeling node bandwidth limits and their effects on vector combining algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

Each node in a message-passing multicomputer typically has several communication links. However, the maximum aggregate communication speed of a node is often less than the sum of its individual link speeds. Such computers are called node bandwidth limited (NBL). The NBL constraint is important when choosing algorithms because it can change the relative performance of different algorithms that accomplish the same task. This paper introduces a model of communication performance for NBL computers and uses the model to analyze the overall performance of three algorithms for vector combining (global sum) on the Intel Touchstone DELTA computer. Each of the threemore » algorithms is found to be at least 33% faster than the other two for some combinations of machine size and vector length. The NBL constraint is shown to significantly affect the conditions under which each algorithm is fastest.« less
Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

NASA Astrophysics Data System (ADS)

Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.
Localization Algorithm Based on a Spring Model (LASM) for Large Scale Wireless Sensor Networks.

PubMed

Chen, Wanming; Mei, Tao; Meng, Max Q-H; Liang, Huawei; Liu, Yumei; Li, Yangming; Li, Shuai

2008-03-15

A navigation method for a lunar rover based on large scale wireless sensornetworks is proposed. To obtain high navigation accuracy and large exploration area, highnode localization accuracy and large network scale are required. However, thecomputational and communication complexity and time consumption are greatly increasedwith the increase of the network scales. A localization algorithm based on a spring model(LASM) method is proposed to reduce the computational complexity, while maintainingthe localization accuracy in large scale sensor networks. The algorithm simulates thedynamics of physical spring system to estimate the positions of nodes. The sensor nodesare set as particles with masses and connected with neighbor nodes by virtual springs. Thevirtual springs will force the particles move to the original positions, the node positionscorrespondingly, from the randomly set positions. Therefore, a blind node position can bedetermined from the LASM algorithm by calculating the related forces with the neighbornodes. The computational and communication complexity are O(1) for each node, since thenumber of the neighbor nodes does not increase proportionally with the network scale size.Three patches are proposed to avoid local optimization, kick out bad nodes and deal withnode variation. Simulation results show that the computational and communicationcomplexity are almost constant despite of the increase of the network scale size. The time consumption has also been proven to remain almost constant since the calculation steps arealmost unrelated with the network scale size.
The Case for Modular Redundancy in Large-Scale High Performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engelmann, Christian; Ong, Hong Hoe; Scott, Stephen L

2009-01-01

Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to failure (MTTF) and a higher mean-time to recover (MTTR) than their predecessors. Modular redundancy is being used in many mission critical systems today to provide for resilience, such as for aerospace and command \\& control systems. The primary argument against modular redundancy for resilience in HPC has always been that the capability of a HPC system, and respective return on investment, would be significantly reduced. We argue that modular redundancy can significantly increasemore » compute node availability as it removes the impact of scale from single compute node MTTR. We further argue that single compute nodes can be much less reliable, and therefore less expensive, and still be highly available, if their MTTR/MTTF ratio is maintained.« less
Consistency mapping of 16 lymph node stations in gastric cancer by CT-based vessel-guided delineation of 255 patients.

PubMed

Xu, Shuhang; Feng, Lingling; Chen, Yongming; Sun, Ying; Lu, Yao; Huang, Shaomin; Fu, Yang; Zheng, Rongqin; Zhang, Yujing; Zhang, Rong

2017-06-20

In order to refine the location and metastasis-risk density of 16 lymph node stations of gastric cancer for neoadjuvant radiotherapy, we retrospectively reviewed the initial images and pathological reports of 255 gastric cancer patients with lymphatic metastasis. Metastatic lymph nodes identified in the initial computed tomography images were investigated by two radiologists with gastrointestinal specialty. A circle with a diameter of 5 mm was used to identify the central position of each metastatic lymph node, defined as the LNc (the central position of the lymph node). The LNc was drawn at the equivalent location on the reference images of a standard patient based on the relative distances to the same reference vessels and the gastric wall using a Monaco® version 5.0 workstation. The image manipulation software Medi-capture was programmed for image analysis to produce a contour and density atlas of 16 lymph node stations. Based on a total of 2846 LNcs contoured (31-599 per lymph node station), we created a density distribution map of 16 lymph node drainage stations of the stomach on computed tomography images, showing the detailed radiographic delineation of each lymph node station as well as high-risk areas for lymph node metastasis. Our mapping can serve as a template for the delineation of gastric lymph node stations when defining clinical target volume in pre-operative radiotherapy for gastric cancer.
Two-dimensional nonsteady viscous flow simulation on the Navier-Stokes computer miniNode

NASA Technical Reports Server (NTRS)

Nosenchuck, Daniel M.; Littman, Michael G.; Flannery, William

1986-01-01

The needs of large-scale scientific computation are outpacing the growth in performance of mainframe supercomputers. In particular, problems in fluid mechanics involving complex flow simulations require far more speed and capacity than that provided by current and proposed Class VI supercomputers. To address this concern, the Navier-Stokes Computer (NSC) was developed. The NSC is a parallel-processing machine, comprised of individual Nodes, each comparable in performance to current supercomputers. The global architecture is that of a hypercube, and a 128-Node NSC has been designed. New architectural features, such as a reconfigurable many-function ALU pipeline and a multifunction memory-ALU switch, have provided the capability to efficiently implement a wide range of algorithms. Efficient algorithms typically involve numerically intensive tasks, which often include conditional operations. These operations may be efficiently implemented on the NSC without, in general, sacrificing vector-processing speed. To illustrate the architecture, programming, and several of the capabilities of the NSC, the simulation of two-dimensional, nonsteady viscous flows on a prototype Node, called the miniNode, is presented.
Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

DOE PAGES

Gamell, Marc; Teranishi, Keita; Kolla, Hemanth; ...

2017-10-26

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
Scalable Failure Masking for Stencil Computations using Ghost Region Expansion and Cell to Rank Remapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamell, Marc; Teranishi, Keita; Kolla, Hemanth

In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
Self-Organizing OFDMA System for Broadband Communication

NASA Technical Reports Server (NTRS)

Roy, Aloke (Inventor); Anandappan, Thanga (Inventor); Malve, Sharath Babu (Inventor)

2016-01-01

Systems and methods for a self-organizing OFDMA system for broadband communication are provided. In certain embodiments a communication node for a self organizing network comprises a communication interface configured to transmit data to and receive data from a plurality of nodes; and a processing unit configured to execute computer readable instructions. Further, computer readable instructions direct the processing unit to identify a sub-region within a cell, wherein the communication node is located in the sub-region; and transmit at least one data frame, wherein the data from the communication node is transmitted at a particular time and frequency as defined within the at least one data frame, where the time and frequency are associated with the sub-region.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A; Coteus, Paul W; Chen, Dong; Gara, Alan; Giampapa, Mark E; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd E; Steinmacher-Burow, Burkhard D; Vranas, Pavlos M

2014-01-07

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to the needs of a processing algorithm.

Prevention of Malicious Nodes Communication in MANETs by Using Authorized Tokens

NASA Astrophysics Data System (ADS)

Chandrakant, N.; Shenoy, P. Deepa; Venugopal, K. R.; Patnaik, L. M.

A rapid increase of wireless networks and mobile computing applications has changed the landscape of network security. A MANET is more susceptible to the attacks than wired network. As a result, attacks with malicious intent have been and will be devised to take advantage of these vulnerabilities and to cripple the MANET operation. Hence we need to search for new architecture and mechanisms to protect the wireless networks and mobile computing applications. In this paper, we examine the nodes that come under the vicinity of base node and members of the network and communication is provided to genuine nodes only. It is found that the proposed algorithm is a effective algorithm for security in MANETs.
Collective network for computer structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Coteus, Paul W [Yorktown Heights, NY; Chen, Dong [Croton On Hudson, NY; Gara, Alan [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Takken, Todd E [Brewster, NY; Steinmacher-Burow, Burkhard D [Wernau, DE; Vranas, Pavlos M [Bedford Hills, NY

2011-08-16

A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
Multi-core processing and scheduling performance in CMS

NASA Astrophysics Data System (ADS)

Hernández, J. M.; Evans, D.; Foulkes, S.

2012-12-01

Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
A novel processing platform for post tape out flows

NASA Astrophysics Data System (ADS)

Vu, Hien T.; Kim, Soohong; Word, James; Cai, Lynn Y.

2018-03-01

As the computational requirements for post tape out (PTO) flows increase at the 7nm and below technology nodes, there is a need to increase the scalability of the computational tools in order to reduce the turn-around time (TAT) of the flows. Utilization of design hierarchy has been one proven method to provide sufficient partitioning to enable PTO processing. However, as the data is processed through the PTO flow, its effective hierarchy is reduced. The reduction is necessary to achieve the desired accuracy. Also, the sequential nature of the PTO flow is inherently non-scalable. To address these limitations, we are proposing a quasi-hierarchical solution that combines multiple levels of parallelism to increase the scalability of the entire PTO flow. In this paper, we describe the system and present experimental results demonstrating the runtime reduction through scalable processing with thousands of computational cores.
Technique for Calculating Solution Derivatives With Respect to Geometry Parameters in a CFD Code

NASA Technical Reports Server (NTRS)

Mathur, Sanjay

2011-01-01

A solution has been developed to the challenges of computation of derivatives with respect to geometry, which is not straightforward because these are not typically direct inputs to the computational fluid dynamics (CFD) solver. To overcome these issues, a procedure has been devised that can be used without having access to the mesh generator, while still being applicable to all types of meshes. The basic approach is inspired by the mesh motion algorithms used to deform the interior mesh nodes in a smooth manner when the surface nodes, for example, are in a fluid structure interaction problem. The general idea is to model the mesh edges and nodes as constituting a spring-mass system. Changes to boundary node locations are propagated to interior nodes by allowing them to assume their new equilibrium positions, for instance, one where the forces on each node are in balance. The main advantage of the technique is that it is independent of the volumetric mesh generator, and can be applied to structured, unstructured, single- and multi-block meshes. It essentially reduces the problem down to defining the surface mesh node derivatives with respect to the geometry parameters of interest. For analytical geometries, this is quite straightforward. In the more general case, one would need to be able to interrogate the underlying parametric CAD (computer aided design) model and to evaluate the derivatives either analytically, or by a finite difference technique. Because the technique is based on a partial differential equation (PDE), it is applicable not only to forward mode problems (where derivatives of all the output quantities are computed with respect to a single input), but it could also be extended to the adjoint problem, either by using an analytical adjoint of the PDE or a discrete analog.
An investigation into scalability and compliance for triple patterning with stitches for metal 1 at the 14nm node

NASA Astrophysics Data System (ADS)

Cork, Christopher; Miloslavsky, Alexander; Friedberg, Paul; Luk-Pat, Gerry

2013-04-01

Lithographers had hoped that single patterning would be enabled at the 20nm node by way of EUV lithography. However, due to delays in EUV readiness, double patterning with 193i lithography is currently relied upon for volume production for the 20nm node's metal 1 layer. At the 14nm and likely at the 10nm node, LE-LE-LE triple patterning technology (TPT) is one of the favored options [1,2] for patterning local interconnect and Metal 1 layers. While previous research has focused on TPT for contact mask, metal layers offer new challenges and opportunities, in particular the ability to decompose design polygons across more than one mask. The extra flexibility offered by the third mask and ability to leverage polygon stitching both serve to improve compliance. However, ensuring TPT compliance - the task of finding a 3-color mask decomposition for a design - is still a difficult task. Moreover, scalability concerns multiply the difficulty of triple patterning decomposition which is an NP-complete problem. Indeed previous work shows that network sizes above a few thousand nodes or polygons start to take significantly longer times to compute [3], making full chip decomposition for arbitrary layouts impractical. In practice Metal 1 layouts can be considered as two separate problem domains, namely: decomposition of standard cells and decomposition of IP blocks. Standard cells typically include only a few 10's of polygons and should be amenable to fast decomposition. Successive design iterations should resolve compliance issues and improve packing density. Density improvements are multiplied repeatedly as standard cells are placed multiple times. IP blocks, on the other hand, may involve very large networks. This paper evaluates multiple approaches to triple patterning decomposition for the Metal 1 layer. The benefits of polygon stitching, in particular, the ability to resolve commonly encountered non-compliant layout configurations and improve packing density, are weighed against the increased difficulty in finding an optimized, legal decomposition and coping with the increased scalability challenges.
Utility of Computed Tomography versus Abdominal Ultrasound Examination to Identify Iliosacral Lymphadenomegaly in Dogs with Apocrine Gland Adenocarcinoma of the Anal Sac.

PubMed

Palladino, S; Keyerleber, M A; King, R G; Burgess, K E

2016-11-01

Apocrine gland adenocarcinoma of the anal sac (AGAAS) is associated with high rates of iliosacral lymph node metastasis, which may influence treatment and prognosis. Magnetic resonance imaging (MRI) recently has been shown to be more sensitive than abdominal ultrasound examination (AUS) in affected patients. To compare the rate of detection of iliosacral lymphadenomegaly between AUS and computed tomography (CT) in dogs with AGAAS. Cohort A: A total of 30 presumed normal dogs. Cohort B: A total of 20 dogs with AGAAS that underwent AUS and CT. Using cohort A, mean normalized lymph node : aorta (LN : AO) ratios were established for medial iliac, internal iliac, and sacral lymph nodes. The CT images in cohort B then were reviewed retrospectively and considered enlarged if their LN : AO ratio measured 2 standard deviations above the mean normalized ratio for that particular node in cohort A. Classification and visibility of lymph nodes identified on AUS were compared to corresponding measurements obtained on CT. Computed tomography identified lymphadenomegaly in 13 of 20 AGAAS dogs. Of these 13 dogs, AUS correctly identified and detected all enlarged nodes in only 30.8%, and either misidentified or failed to detect additional enlarged nodes in the remaining dogs. Despite limitations in identifying enlargement in all affected lymph nodes, AUS identified at least 1 enlarged node in 100% of affected dogs. Abdominal ultrasound examination is an effective screening test for lymphadenomegaly in dogs with AGAAS, but CT should be considered in any patient in which an additional metastatic site would impact therapeutic planning. Copyright © 2016 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of the American College of Veterinary Internal Medicine.
SU-E-T-628: A Cloud Computing Based Multi-Objective Optimization Method for Inverse Treatment Planning.

PubMed

Na, Y; Suh, T; Xing, L

2012-06-01

Multi-objective (MO) plan optimization entails generation of an enormous number of IMRT or VMAT plans constituting the Pareto surface, which presents a computationally challenging task. The purpose of this work is to overcome the hurdle by developing an efficient MO method using emerging cloud computing platform. As a backbone of cloud computing for optimizing inverse treatment planning, Amazon Elastic Compute Cloud with a master node (17.1 GB memory, 2 virtual cores, 420 GB instance storage, 64-bit platform) is used. The master node is able to scale seamlessly a number of working group instances, called workers, based on the user-defined setting account for MO functions in clinical setting. Each worker solved the objective function with an efficient sparse decomposition method. The workers are automatically terminated if there are finished tasks. The optimized plans are archived to the master node to generate the Pareto solution set. Three clinical cases have been planned using the developed MO IMRT and VMAT planning tools to demonstrate the advantages of the proposed method. The target dose coverage and critical structure sparing of plans are comparable obtained using the cloud computing platform are identical to that obtained using desktop PC (Intel Xeon® CPU 2.33GHz, 8GB memory). It is found that the MO planning speeds up the processing of obtaining the Pareto set substantially for both types of plans. The speedup scales approximately linearly with the number of nodes used for computing. With the use of N nodes, the computational time is reduced by the fitting model, 0.2+2.3/N, with r̂2>0.99, on average of the cases making real-time MO planning possible. A cloud computing infrastructure is developed for MO optimization. The algorithm substantially improves the speed of inverse plan optimization. The platform is valuable for both MO planning and future off- or on-line adaptive re-planning. © 2012 American Association of Physicists in Medicine.
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

PubMed

Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

2014-09-01

Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA's International Lunar Network Anchor Nodes and Robotic Lunar Lander Project Update

NASA Technical Reports Server (NTRS)

Morse, Brian J.; Reed, Cheryl L. B.; Kirby, Karen W.; Cohen, Barbara A.; Bassler, Julie A.; Harris, Danny W.; Chavers, D. Gregory

2010-01-01

In early 2008, NASA established the Lunar Quest Program, a new lunar science research program within NASA s Science Mission Directorate. The program included the establishment of the anchor nodes of the International Lunar Network (ILN), a network of lunar science stations envisioned to be emplaced by multiple nations. This paper describes the current status of the ILN Anchor Nodes mission development and the lander risk-reduction design and test activities implemented jointly by NASA s Marshall Space Flight Center and The Johns Hopkins University Applied Physics Laboratory. The lunar lander concepts developed by this team are applicable to multiple science missions, and this paper will describe a mission combining the functionality of an ILN node with an investigation of lunar polar volatiles.
A Survey on the Feasibility of Sound Classification on Wireless Sensor Nodes

PubMed Central

Salomons, Etto L.; Havinga, Paul J. M.

2015-01-01

Wireless sensor networks are suitable to gain context awareness for indoor environments. As sound waves form a rich source of context information, equipping the nodes with microphones can be of great benefit. The algorithms to extract features from sound waves are often highly computationally intensive. This can be problematic as wireless nodes are usually restricted in resources. In order to be able to make a proper decision about which features to use, we survey how sound is used in the literature for global sound classification, age and gender classification, emotion recognition, person verification and identification and indoor and outdoor environmental sound classification. The results of the surveyed algorithms are compared with respect to accuracy and computational load. The accuracies are taken from the surveyed papers; the computational loads are determined by benchmarking the algorithms on an actual sensor node. We conclude that for indoor context awareness, the low-cost algorithms for feature extraction perform equally well as the more computationally-intensive variants. As the feature extraction still requires a large amount of processing time, we present four possible strategies to deal with this problem. PMID:25822142
Probabilistic Approach to Enable Extreme-Scale Simulations under Uncertainty and System Faults. Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Knio, Omar

2017-05-05

The current project develops a novel approach that uses a probabilistic description to capture the current state of knowledge about the computational solution. To effectively spread the computational effort over multiple nodes, the global computational domain is split into many subdomains. Computational uncertainty in the solution translates into uncertain boundary conditions for the equation system to be solved on those subdomains, and many independent, concurrent subdomain simulations are used to account for this bound- ary condition uncertainty. By relying on the fact that solutions on neighboring subdomains must agree with each other, a more accurate estimate for the global solutionmore » can be achieved. Statistical approaches in this update process make it possible to account for the effect of system faults in the probabilistic description of the computational solution, and the associated uncertainty is reduced through successive iterations. By combining all of these elements, the probabilistic reformulation allows splitting the computational work over very many independent tasks for good scalability, while being robust to system faults.« less
Distributed solar radiation fast dynamic measurement for PV cells

NASA Astrophysics Data System (ADS)

Wan, Xuefen; Yang, Yi; Cui, Jian; Du, Xingjing; Zheng, Tao; Sardar, Muhammad Sohail

2017-10-01

To study the operating characteristics about PV cells, attention must be given to the dynamic behavior of the solar radiation. The dynamic behaviors of annual, monthly, daily and hourly averages of solar radiation have been studied in detail. But faster dynamic behaviors of solar radiation need more researches. The solar radiation random fluctuations in minute-long or second-long range, which lead to alternating radiation and cool down/warm up PV cell frequently, decrease conversion efficiency. Fast dynamic processes of solar radiation are mainly relevant to stochastic moving of clouds. Even in clear sky condition, the solar irradiations show a certain degree of fast variation. To evaluate operating characteristics of PV cells under fast dynamic irradiation, a solar radiation measuring array (SRMA) based on large active area photodiode, LoRa spread spectrum communication and nanoWatt MCU is proposed. This cross photodiodes structure tracks fast stochastic moving of clouds. To compensate response time of pyranometer and reduce system cost, the terminal nodes with low-cost fast-responded large active area photodiode are placed besides positions of tested PV cells. A central node, consists with pyranometer, large active area photodiode, wind detector and host computer, is placed in the center of the central topologies coordinate to scale temporal envelope of solar irradiation and get calibration information between pyranometer and large active area photodiodes. In our SRMA system, the terminal nodes are designed based on Microchip's nanoWatt XLP PIC16F1947. FDS-100 is adopted for large active area photodiode in terminal nodes and host computer. The output current and voltage of each PV cell are monitored by I/V measurement. AS62-T27/SX1278 LoRa communication modules are used for communicating between terminal nodes and host computer. Because the LoRa LPWAN (Low Power Wide Area Network) specification provides seamless interoperability among Smart Things without the need of complex local installations, configuring of our SRMA system is very easy. Lora also provides SRMA a means to overcome the short communication distance and weather signal propagation decline such as in ZigBee and WiFi. The host computer in SRMA system uses the low power single-board PC EMB-3870 which was produced by NORCO. Wind direction sensor SM5386B and wind-force sensor SM5387B are installed to host computer through RS-485 bus for wind reference data collection. And Davis 6450 solar radiation sensor, which is a precision instrument that detects radiation at wavelengths of 300 to 1100 nanometers, allow host computer to follow real-time solar radiation. A LoRa polling scheme is adopt for the communication between host computer and terminal nodes in SRMA. An experimental SRMA has been established. This system was tested in Ganyu, Jiangshu province from May to August, 2016. In the test, the distances between the nodes and the host computer were between 100m and 1900m. At work, SRMA system showed higher reliability. Terminal nodes could follow the instructions from host computer and collect solar radiation data of distributed PV cells effectively. And the host computer managed the SRAM and achieves reference parameters well. Communications between the host computer and terminal nodes were almost unaffected by the weather. In conclusion, the testing results show that SRMA could be a capable method for fast dynamic measuring about solar radiation and related PV cell operating characteristics.
Load balancing strategy and its lookup-table enhancement in deterministic space delay/disruption tolerant networks

NASA Astrophysics Data System (ADS)

Huang, Jinhui; Liu, Wenxiang; Su, Yingxue; Wang, Feixue

2018-02-01

Space networks, in which connectivity is deterministic and intermittent, can be modeled by delay/disruption tolerant networks. In space delay/disruption tolerant networks, a packet is usually transmitted from the source node to the destination node indirectly via a series of relay nodes. If anyone of the nodes in the path becomes congested, the packet will be dropped due to buffer overflow. One of the main reasons behind congestion is the unbalanced network traffic distribution. We propose a load balancing strategy which takes the congestion status of both the local node and relay nodes into account. The congestion status, together with the end-to-end delay, is used in the routing selection. A lookup-table enhancement is also proposed. The off-line computation and the on-line adjustment are combined together to make a more precise estimate of the end-to-end delay while at the same time reducing the onboard computation. Simulation results show that the proposed strategy helps to distribute network traffic more evenly and therefore reduces the packet drop ratio. In addition, the average delay is also decreased in most cases. The lookup-table enhancement provides a compromise between the need for better communication performance and the desire for less onboard computation.
On the Performance Evaluation of a MIMO-WCDMA Transmission Architecture for Building Management Systems.

PubMed

Tsampasis, Eleftherios; Gkonis, Panagiotis K; Trakadas, Panagiotis; Zahariadis, Theodοre

2018-01-08

The goal of this study was to investigate the performance of a realistic wireless sensor nodes deployment in order to support modern building management systems (BMSs). A three-floor building orientation is taken into account, where each node is equipped with a multi-antenna system while a central base station (BS) collects and processes all received information. The BS is also equipped with multiple antennas; hence, a multiple input-multiple output (MIMO) system is formulated. Due to the multiple reflections during transmission in the inner of the building, a wideband code division multiple access (WCDMA) physical layer protocol has been considered, which has already been adopted for third-generation (3G) mobile networks. Results are presented for various MIMO orientations, where the mean transmission power per node is considered as an output metric for a specific signal-to-noise ratio (SNR) requirement and number of resolvable multipath components. In the first set of presented results, the effects of multiple access interference on overall transmission power are highlighted. As the number of mobile nodes per floor or the requested transmission rate increases, MIMO systems of a higher order should be deployed in order to maintain transmission power at adequate levels. In the second set of results, a comparison is performed among transmission in diversity combining and spatial multiplexing mode, which clearly indicate that the first case is the most appropriate solution for indoor communications.
RTOG GU Radiation Oncology Specialists Reach Consensus on Pelvic Lymph Node Volumes for High-Risk Prostate Cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lawton, Colleen A.F.; Michalski, Jeff; El-Naqa, Issam

2009-06-01

Purpose: Radiation therapy to the pelvic lymph nodes in high-risk prostate cancer is required on several Radiation Therapy Oncology Group (RTOG) clinical trials. Based on a prior lymph node contouring project, we have shown significant disagreement in the definition of pelvic lymph node volumes among genitourinary radiation oncology specialists involved in developing and executing current RTOG trials. Materials and Methods: A consensus meeting was held on October 3, 2007, to reach agreement on pelvic lymph node volumes. Data were presented to address the lymph node drainage of the prostate. Extensive discussion ensued to develop clinical target volume (CTV) pelvic lymphmore » node consensus. Results: Consensus was obtained resulting in computed tomography image-based pelvic lymph node CTVs. Based on this consensus, the pelvic lymph node volumes to be irradiated include: distal common iliac, presacral lymph nodes (S{sub 1}-S{sub 3}), external iliac lymph nodes, internal iliac lymph nodes, and obturator lymph nodes. Lymph node CTVs include the vessels (artery and vein) and a 7-mm radial margin being careful to 'carve out' bowel, bladder, bone, and muscle. Volumes begin at the L5/S1 interspace and end at the superior aspect of the pubic bone. Consensus on dose-volume histogram constraints for OARs was also attained. Conclusions: Consensus on pelvic lymph node CTVs for radiation therapy to address high-risk prostate cancer was attained and is available as web-based computed tomography images as well as a descriptive format through the RTOG. This will allow for uniformity in evaluating the benefit and risk of such treatment.« less
Practical implementation of tetrahedral mesh reconstruction in emission tomography

PubMed Central

Boutchko, R.; Sitek, A.; Gullberg, G. T.

2014-01-01

This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio projection datasets. The results demonstrate that the reconstructed images represented as tetrahedral meshes based on point clouds offer image quality comparable to that achievable using a standard voxel grid while allowing substantial reduction in the number of unknown intensities to be reconstructed and reducing the noise. PMID:23588373
Practical implementation of tetrahedral mesh reconstruction in emission tomography

NASA Astrophysics Data System (ADS)

Boutchko, R.; Sitek, A.; Gullberg, G. T.

2013-05-01

This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio projection datasets. The results demonstrate that the reconstructed images represented as tetrahedral meshes based on point clouds offer image quality comparable to that achievable using a standard voxel grid while allowing substantial reduction in the number of unknown intensities to be reconstructed and reducing the noise.
Distributed downhole drilling network

DOEpatents

Hall, David R.; Hall, Jr., H. Tracy; Fox, Joe; Pixton, David S.

2006-11-21

A high-speed downhole network providing real-time data from downhole components of a drilling strings includes a bottom-hole node interfacing to a bottom-hole assembly located proximate the bottom end of a drill string. A top-hole node is connected proximate the top end of the drill string. One or several intermediate nodes are located along the drill string between the bottom-hole node and the top-hole node. The intermediate nodes are configured to receive and transmit data packets transmitted between the bottom-hole node and the top-hole node. A communications link, integrated into the drill string, is used to operably connect the bottom-hole node, the intermediate nodes, and the top-hole node. In selected embodiments, a personal or other computer may be connected to the top-hole node, to analyze data received from the intermediate and bottom-hole nodes.

Efficient computation of kinship and identity coefficients on large pedigrees.

PubMed

Cheng, En; Elliott, Brendan; Ozsoyoglu, Z Meral

2009-06-01

With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing inbreeding coefficient. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. Furthermore, we propose an improved scheme using Family NodeCodes for the computation of generalized kinship coefficients, which is motivated by the significant improvement of using Family NodeCodes for inbreeding coefficient over the use of NodeCodes. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
Lumping of degree-based mean-field and pair-approximation equations for multistate contact processes

NASA Astrophysics Data System (ADS)

Kyriakopoulos, Charalampos; Grossmann, Gerrit; Wolf, Verena; Bortolussi, Luca

2018-01-01

Contact processes form a large and highly interesting class of dynamic processes on networks, including epidemic and information-spreading networks. While devising stochastic models of such processes is relatively easy, analyzing them is very challenging from a computational point of view, particularly for large networks appearing in real applications. One strategy to reduce the complexity of their analysis is to rely on approximations, often in terms of a set of differential equations capturing the evolution of a random node, distinguishing nodes with different topological contexts (i.e., different degrees of different neighborhoods), such as degree-based mean-field (DBMF), approximate-master-equation (AME), or pair-approximation (PA) approaches. The number of differential equations so obtained is typically proportional to the maximum degree kmax of the network, which is much smaller than the size of the master equation of the underlying stochastic model, yet numerically solving these equations can still be problematic for large kmax. In this paper, we consider AME and PA, extended to cope with multiple local states, and we provide an aggregation procedure that clusters together nodes having similar degrees, treating those in the same cluster as indistinguishable, thus reducing the number of equations while preserving an accurate description of global observables of interest. We also provide an automatic way to build such equations and to identify a small number of degree clusters that give accurate results. The method is tested on several case studies, where it shows a high level of compression and a reduction of computational time of several orders of magnitude for large networks, with minimal loss in accuracy.
Parallel Calculations in LS-DYNA

NASA Astrophysics Data System (ADS)

Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey

2017-11-01

Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Esophageal cancer associated with a sarcoid-like reaction and systemic sarcoidosis in lymph nodes: supportive findings of [18F]-fluorodeoxyglucose positron emission tomography-computed tomography during neoadjuvant therapy.

PubMed

Kishino, Takayoshi; Okano, Keiichi; Ando, Yasuhisa; Suto, Hironobu; Asano, Eisuke; Oshima, Minoru; Fujiwara, Masao; Usuki, Hisashi; Kobara, Hideki; Masaki, Tsutomu; Ibuki, Emi; Kushida, Yoshio; Haba, Reiji; Suzuki, Yasuyuki

2018-06-25

In patients with esophageal cancer, differentiation between lymph node metastasis and lymphadenopathies from sarcoidosis or sarcoid-like reactions of lymph nodes is clinically important. Herein, we report two esophageal cancer cases with lymph node involvement of sarcoid-like reaction or sarcoidosis. One patient received chemotherapy and the other chemoradiotherapy as initial treatments. In both cases, [ 18 F]-fluorodeoxyglucose positron emission tomography-computed tomography (FDG-PET/CT) was performed before and after chemo(radio)therapy. After the treatment, FDG uptake was not detected in the primary tumor, but it was slightly reduced in the hilar and mediastinal lymph nodes in both cases. These non-identical responses to chemo(radio)therapy suggest the presence of sarcoid-like reaction of lymph nodes associated with squamous cell carcinoma of the esophagus. Curative surgical resection was performed as treatment. These FDG-PET/CT findings may be helpful to distinguish between metastasis and sarcoidosis-associated lymphadenopathy in esophageal cancer.
The Role of Energy Reservoirs in Distributed Computing: Manufacturing, Implementing, and Optimizing Energy Storage in Energy-Autonomous Sensor Nodes

NASA Astrophysics Data System (ADS)

Cowell, Martin Andrew

The world already hosts more internet connected devices than people, and that ratio is only increasing. These devices seamlessly integrate with peoples lives to collect rich data and give immediate feedback about complex systems from business, health care, transportation, and security. As every aspect of global economies integrate distributed computing into their industrial systems and these systems benefit from rich datasets. Managing the power demands of these distributed computers will be paramount to ensure the continued operation of these networks, and is elegantly addressed by including local energy harvesting and storage on a per-node basis. By replacing non-rechargeable batteries with energy harvesting, wireless sensor nodes will increase their lifetimes by an order of magnitude. This work investigates the coupling of high power energy storage with energy harvesting technologies to power wireless sensor nodes; with sections covering device manufacturing, system integration, and mathematical modeling. First we consider the energy storage mechanism of supercapacitors and batteries, and identify favorable characteristics in both reservoir types. We then discuss experimental methods used to manufacture high power supercapacitors in our labs. We go on to detail the integration of our fabricated devices with collaborating labs to create functional sensor node demonstrations. With the practical knowledge gained through in-lab manufacturing and system integration, we build mathematical models to aid in device and system design. First, we model the mechanism of energy storage in porous graphene supercapacitors to aid in component architecture optimization. We then model the operation of entire sensor nodes for the purpose of optimally sizing the energy harvesting and energy reservoir components. In consideration of deploying these sensor nodes in real-world environments, we model the operation of our energy harvesting and power management systems subject to spatially and temporally varying energy availability in order to understand sensor node reliability. Looking to the future, we see an opportunity for further research to implement machine learning algorithms to control the energy resources of distributed computing networks.
An evaluation of the state of time synchronization on leadership class supercomputers

DOE PAGES

Jones, Terry; Ostrouchov, George; Koenig, Gregory A.; ...

2017-10-09

We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
An evaluation of the state of time synchronization on leadership class supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Terry; Ostrouchov, George; Koenig, Gregory A.

We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Lymph Node Size on Computed Tomography Images Is a Predictive Indicator for Lymph Node Metastasis in Patients with Colorectal Neuroendocrine Tumors.

PubMed

Tanaka, Toshiaki; Nozawa, Hiroaki; Kawai, Kazushige; Hata, Keisuke; Kiyomatsu, Tomomichi; Nishikawa, Takeshi; Otani, Kensuke; Sasaki, Kazuhito; Murono, Koji; Watanabe, Toshiaki

2017-01-01

Colorectal neuroendocrine tumors (NET) are a rare manifestation of colorectal neoplasia, requiring for radical dissection of the regional lymph nodes along with colorectal resection similar to that required for colorectal cancer. However, thus far, no reports have described the ability of computed tomography (CT) to predict lymph node involvement. In this study, we revealed the prediction rate of lymph node metastasis using contrast-enhanced CT. A total of 21 patients with colorectal NET undergoing colorectal resection were recruited from January 2010 to June 2016. We compared the CT findings between samples with or without pathologically proven lymph node metastasis, in each field (pericolic/perirectal and intermediate nodes). Within the pericolic/perirectal field, any lymph node larger than 5 mm in the CT images was a predictive indicator of lymph node metastasis with a sensitivity, specificity, and area under ROC curve (AUC) of 66.7%, 87.5%, and 0.844, respectively. Within the intermediate field, any visible lymph node on the CT was a predictive indicator of lymph node metastasis with a sensitivity, specificity, and AUC of 100%, 76.4%, and 0.890, respectively. In addition, when we observed lymph nodes larger than 3 mm on the CT images, the sensitivity and specificity were 100% and 82.4%, respectively, with an AUC of 0.8971. CT images provide predictive information for lymph node metastasis with a high rate of accuracy. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Symplectic multi-particle tracking on GPUs

NASA Astrophysics Data System (ADS)

Liu, Zhicong; Qiang, Ji

2018-05-01

A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
Research in Wireless Networks and Communications

DTIC Science & Technology

2008-05-01

TESTBED SETUP AND INITIAL MULTI-HOP EXPERIENCE As a proof of concept, we assembled a testbed platform of nodes based on 400MHz AMD Geode single-board...experi- ments on a testbed network consisting of 400MHz AMD Geode single-board computers made by Thecus Inc. We equipped each of these nodes with two...ground nodes were placed on a line, with about 3 feet of separation between adjacent nodes. The nodes were powered by 400MHz AMD Geode single-board
WinHPC System Configuration | High-Performance Computing | NREL

Science.gov Websites

CPUs with 48GB of memory. Node 04 has dual Intel Xeon E5530 CPUs with 24GB of memory. Nodes 05-20 have dual AMD Opteron 2374 HE CPUs with 16GB of memory. Nodes 21-30 have been decommissioned. Nodes 31-35 have dual Intel Xeon X5675 CPUs with 48GB of memory. Nodes 36-37 have dual Intel Xeon E5-2680 CPUs with
Parallelization of the Flow Field Dependent Variation Scheme for Solving the Triple Shock/Boundary Layer Interaction Problem

NASA Technical Reports Server (NTRS)

Schunk, Richard Gregory; Chung, T. J.

2001-01-01

A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
INTEGRATED MONITORING HARDWARE DEVELOPMENTS AT LOS ALAMOS

DOE Office of Scientific and Technical Information (OSTI.GOV)

R. PARKER; J. HALBIG; ET AL

1999-09-01

The hardware of the integrated monitoring system supports a family of instruments having a common internal architecture and firmware. Instruments can be easily configured from application-specific personality boards combined with common master-processor and high- and low-voltage power supply boards, and basic operating firmware. The instruments are designed to function autonomously to survive power and communication outages and to adapt to changing conditions. The personality boards allow measurement of gross gammas and neutrons, neutron coincidence and multiplicity, and gamma spectra. In addition, the Intelligent Local Node (ILON) provides a moderate-bandwidth network to tie together instruments, sensors, and computers.
[A case of malignant lymphoma successfully diagnosed using Sasada transbronchial angled biopsy forceps].

PubMed

Komori, Chika; Sasada, Shinji; Okamoto, Norio; Kawahara, Kunimitsu; Uehara, Nobuko; Shimada, Kazutaka; Kuhara, Hanako; Terada, Haruko; Tsujino, Kazuyuki; Matsunashi, Tatsuro; Minami, Toshiyuki; Suzuki, Hidekazu; Kobayashi, Masashi; Hirashima, Tomonori; Matsui, Kaoru; Kawase, Ichiro; Kusunoki, Yoko

2009-01-01

A 68-year-old man was referred to our hospital due to general fatigue, fever and weight loss. His chest radiograph showed a nodule (2.8 cm) in the right middle lobe. Computed tomography and positron emission tomography showed multiple metastases to the bone, liver and lymph nodes. The lung nodule was not accessible by standard transbronchial forceps. However, biopsy specimens obtained using Sasada Transbronchial Angled Biopsy Forceps (STAF) pathologically confirmed the diagnosis of malignant lymphoma. We report the case, and discuss the utility of STAF for lung lesions that are difficult to access with standard forceps.
Effects of maximum node degree on computer virus spreading in scale-free networks

NASA Astrophysics Data System (ADS)

Bamaarouf, O.; Ould Baba, A.; Lamzabi, S.; Rachadi, A.; Ez-Zahraouy, H.

2017-10-01

The increase of the use of the Internet networks favors the spread of viruses. In this paper, we studied the spread of viruses in the scale-free network with different topologies based on the Susceptible-Infected-External (SIE) model. It is found that the network structure influences the virus spreading. We have shown also that the nodes of high degree are more susceptible to infection than others. Furthermore, we have determined a critical maximum value of node degree (Kc), below which the network is more resistible and the computer virus cannot expand into the whole network. The influence of network size is also studied. We found that the network with low size is more effective to reduce the proportion of infected nodes.
Optimized scalable network switch

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2007-12-04

In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Optimized scalable network switch

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

2010-02-23

In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Fault-Tolerant Local-Area Network

NASA Technical Reports Server (NTRS)

Morales, Sergio; Friedman, Gary L.

1988-01-01

Local-area network (LAN) for computers prevents single-point failure from interrupting communication between nodes of network. Includes two complete cables, LAN 1 and LAN 2. Microprocessor-based slave switches link cables to network-node devices as work stations, print servers, and file servers. Slave switches respond to commands from master switch, connecting nodes to two cable networks or disconnecting them so they are completely isolated. System monitor and control computer (SMC) acts as gateway, allowing nodes on either cable to communicate with each other and ensuring that LAN 1 and LAN 2 are fully used when functioning properly. Network monitors and controls itself, automatically routes traffic for efficient use of resources, and isolates and corrects its own faults, with potential dramatic reduction in time out of service.
Use of multi-node wells in the Groundwater-Management Process of MODFLOW-2005 (GWM-2005)

USGS Publications Warehouse

Ahlfeld, David P.; Barlow, Paul M.

2013-01-01

Many groundwater wells are open to multiple aquifers or to multiple intervals within a single aquifer. These types of wells can be represented in numerical simulations of groundwater flow by use of the Multi-Node Well (MNW) Packages developed for the U.S. Geological Survey’s MODFLOW model. However, previous versions of the Groundwater-Management (GWM) Process for MODFLOW did not allow the use of multi-node wells in groundwater-management formulations. This report describes modifications to the MODFLOW–2005 version of the GWM Process (GWM–2005) to provide for such use with the MNW2 Package. Multi-node wells can be incorporated into a management formulation as flow-rate decision variables for which optimal withdrawal or injection rates will be determined as part of the GWM–2005 solution process. In addition, the heads within multi-node wells can be used as head-type state variables, and, in that capacity, be included in the objective function or constraint set of a management formulation. Simple head bounds also can be defined to constrain water levels at multi-node wells. The report provides instructions for including multi-node wells in the GWM–2005 data-input files and a sample problem that demonstrates use of multi-node wells in a typical groundwater-management problem.
Fuzzy neural network technique for system state forecasting.

PubMed

Li, Dezhi; Wang, Wilson; Ismail, Fathy

2013-10-01

In many system state forecasting applications, the prediction is performed based on multiple datasets, each corresponding to a distinct system condition. The traditional methods dealing with multiple datasets (e.g., vector autoregressive moving average models and neural networks) have some shortcomings, such as limited modeling capability and opaque reasoning operations. To tackle these problems, a novel fuzzy neural network (FNN) is proposed in this paper to effectively extract information from multiple datasets, so as to improve forecasting accuracy. The proposed predictor consists of both autoregressive (AR) nodes modeling and nonlinear nodes modeling; AR models/nodes are used to capture the linear correlation of the datasets, and the nonlinear correlation of the datasets are modeled with nonlinear neuron nodes. A novel particle swarm technique [i.e., Laplace particle swarm (LPS) method] is proposed to facilitate parameters estimation of the predictor and improve modeling accuracy. The effectiveness of the developed FNN predictor and the associated LPS method is verified by a series of tests related to Mackey-Glass data forecast, exchange rate data prediction, and gear system prognosis. Test results show that the developed FNN predictor and the LPS method can capture the dynamics of multiple datasets effectively and track system characteristics accurately.

Comparative Effects of Computer-Based Concept Maps, Refutational Texts, and Expository Texts on Science Learning

ERIC Educational Resources Information Center

Adesope, Olusola O.; Cavagnetto, Andy; Hunsu, Nathaniel J.; Anguiano, Carlos; Lloyd, Joshua

2017-01-01

This study used a between-subjects experimental design to examine the effects of three different computer-based instructional strategies (concept map, refutation text, and expository scientific text) on science learning. Concept maps are node-link diagrams that show concepts as nodes and relationships among the concepts as labeled links.…
High Performance Active Database Management on a Shared-Nothing Parallel Processor

DTIC Science & Technology

1998-05-01

either stored or virtual. A stored node is like a materialized view. It actually contains the specified tuples. A virtual node is like a real view...90292-6695 DL-5 COLUMBIA UNIV/DEPT COMPUTER SCIENCi ATTN: OR GAIL £. KAISER 450 COMPUTER SCIENCE 3LDG 500 WEST 12ÖTH STRSET NEW YORK NY 10027
A site oriented supercomputer for theoretical physics: The Fermilab Advanced Computer Program Multi Array Processor System (ACMAPS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nash, T.; Atac, R.; Cook, A.

1989-03-06

The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
Robust scalable stabilisability conditions for large-scale heterogeneous multi-agent systems with uncertain nonlinear interactions: towards a distributed computing architecture

NASA Astrophysics Data System (ADS)

Manfredi, Sabato

2016-06-01

Large-scale dynamic systems are becoming highly pervasive in their occurrence with applications ranging from system biology, environment monitoring, sensor networks, and power systems. They are characterised by high dimensionality, complexity, and uncertainty in the node dynamic/interactions that require more and more computational demanding methods for their analysis and control design, as well as the network size and node system/interaction complexity increase. Therefore, it is a challenging problem to find scalable computational method for distributed control design of large-scale networks. In this paper, we investigate the robust distributed stabilisation problem of large-scale nonlinear multi-agent systems (briefly MASs) composed of non-identical (heterogeneous) linear dynamical systems coupled by uncertain nonlinear time-varying interconnections. By employing Lyapunov stability theory and linear matrix inequality (LMI) technique, new conditions are given for the distributed control design of large-scale MASs that can be easily solved by the toolbox of MATLAB. The stabilisability of each node dynamic is a sufficient assumption to design a global stabilising distributed control. The proposed approach improves some of the existing LMI-based results on MAS by both overcoming their computational limits and extending the applicative scenario to large-scale nonlinear heterogeneous MASs. Additionally, the proposed LMI conditions are further reduced in terms of computational requirement in the case of weakly heterogeneous MASs, which is a common scenario in real application where the network nodes and links are affected by parameter uncertainties. One of the main advantages of the proposed approach is to allow to move from a centralised towards a distributed computing architecture so that the expensive computation workload spent to solve LMIs may be shared among processors located at the networked nodes, thus increasing the scalability of the approach than the network size. Finally, a numerical example shows the applicability of the proposed method and its advantage in terms of computational complexity when compared with the existing approaches.
Community detection using preference networks

NASA Astrophysics Data System (ADS)

Tasgin, Mursel; Bingol, Haluk O.

2018-04-01

Community detection is the task of identifying clusters or groups of nodes in a network where nodes within the same group are more connected with each other than with nodes in different groups. It has practical uses in identifying similar functions or roles of nodes in many biological, social and computer networks. With the availability of very large networks in recent years, performance and scalability of community detection algorithms become crucial, i.e. if time complexity of an algorithm is high, it cannot run on large networks. In this paper, we propose a new community detection algorithm, which has a local approach and is able to run on large networks. It has a simple and effective method; given a network, algorithm constructs a preference network of nodes where each node has a single outgoing edge showing its preferred node to be in the same community with. In such a preference network, each connected component is a community. Selection of the preferred node is performed using similarity based metrics of nodes. We use two alternatives for this purpose which can be calculated in 1-neighborhood of nodes, i.e. number of common neighbors of selector node and its neighbors and, the spread capability of neighbors around the selector node which is calculated by the gossip algorithm of Lind et.al. Our algorithm is tested on both computer generated LFR networks and real-life networks with ground-truth community structure. It can identify communities accurately in a fast way. It is local, scalable and suitable for distributed execution on large networks.
Multi-agent grid system Agent-GRID with dynamic load balancing of cluster nodes

NASA Astrophysics Data System (ADS)

Satymbekov, M. N.; Pak, I. T.; Naizabayeva, L.; Nurzhanov, Ch. A.

2017-12-01

In this study the work presents the system designed for automated load balancing of the contributor by analysing the load of compute nodes and the subsequent migration of virtual machines from loaded nodes to less loaded ones. This system increases the performance of cluster nodes and helps in the timely processing of data. A grid system balances the work of cluster nodes the relevance of the system is the award of multi-agent balancing for the solution of such problems.
Performance of VPIC on Trinity

NASA Astrophysics Data System (ADS)

Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Li, H.; Nam, H. A.; Pang, X.; Rust, W. N., III; Wohlbier, J.; Yin, L.; Albright, B. J.

2016-10-01

Trinity is a new major DOE computing resource which is going through final acceptance testing at Los Alamos National Laboratory. Trinity has several new and unique architectural features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes. Additional unique features include use of on package high bandwidth memory (HBM) for the KNL nodes, the ability to configure the KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to port and optimize VPIC to Trinity and evaluate its performance. Because VPIC was recently released as Open Source, it is being used as part of acceptance testing for Trinity and is participating in the Trinity Open Science Program which has resulted in excellent collaboration activities with both Cray and Intel. Results of this work will be presented on performance of VPIC on both Haswell and KNL partitions for both single node runs and runs at scale. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Synthesis of natural flows at selected sites in the upper Missouri River basin, Montana, 1928-89

USGS Publications Warehouse

Cary, L.E.; Parrett, Charles

1996-01-01

Natural monthly streamflows were synthesized for the years 1928-89 for 43 sites in the upper Missouri River Basin upstream from Fort Peck Lake in Montana. The sites are represented as nodes in a streamflow accounting model being developed by the Bureau of Reclamation. Recorded and historical flows at most sites have been affected by human activities including reservoir storage, diversions for irrigation, and municipal use. Natural flows at the sites were synthesized by eliminating the effects of these activities. Recorded data at some sites do not include the entire study period. The missing flows at these sites were estimated using a statistical procedure. The methods of synthesis varied, depending on upstream activities and information available. Recorded flows were transferred to nodes that did not have streamflow-gaging stations from the nearest station with a sufficient length of record. The flows at one node were computed as the sum of flows from three upstream tributaries. Monthly changes in reservoir storage were computed from monthend contents. The changes in storage were corrected for the effects of evaporation and precipitation using pan-evaporation and precipitation data from climate stations. Irrigation depletions and consumptive use by the three largest municipalities were computed. Synthesized natural flow at most nodes was computed by adding algebraically the upstream depletions and changes in reservoir storage to recorded or historical flow at the nodes.
An energy efficient multiple mobile sinks based routing algorithm for wireless sensor networks

NASA Astrophysics Data System (ADS)

Zhong, Peijun; Ruan, Feng

2018-03-01

With the fast development of wireless sensor networks (WSNs), more and more energy efficient routing algorithms have been proposed. However, one of the research challenges is how to alleviate the hot spot problem since nodes close to static sink (or base station) tend to die earlier than other sensors. The introduction of mobile sink node can effectively alleviate this problem since sink node can move along certain trajectories, causing hot spot nodes more evenly distributed. In this paper, we mainly study the energy efficient routing method with multiple mobile sinks support. We divide the whole network into several clusters and study the influence of mobile sink number on network lifetime. Simulation results show that the best network performance appears when mobile sink number is about 3 under our simulation environment.
GATE Monte Carlo simulation in a cloud computing environment

NASA Astrophysics Data System (ADS)

Rowedder, Blake Austin

The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
Texture Analysis and Synthesis of Malignant and Benign Mediastinal Lymph Nodes in Patients with Lung Cancer on Computed Tomography

NASA Astrophysics Data System (ADS)

Pham, Tuan D.; Watanabe, Yuzuru; Higuchi, Mitsunori; Suzuki, Hiroyuki

2017-02-01

Texture analysis of computed tomography (CT) imaging has been found useful to distinguish subtle differences, which are in- visible to human eyes, between malignant and benign tissues in cancer patients. This study implemented two complementary methods of texture analysis, known as the gray-level co-occurrence matrix (GLCM) and the experimental semivariogram (SV) with an aim to improve the predictive value of evaluating mediastinal lymph nodes in lung cancer. The GLCM was explored with the use of a rich set of its derived features, whereas the SV feature was extracted on real and synthesized CT samples of benign and malignant lymph nodes. A distinct advantage of the computer methodology presented herein is the alleviation of the need for an automated precise segmentation of the lymph nodes. Using the logistic regression model, a sensitivity of 75%, specificity of 90%, and area under curve of 0.89 were obtained in the test population. A tenfold cross-validation of 70% accuracy of classifying between benign and malignant lymph nodes was obtained using the support vector machines as a pattern classifier. These results are higher than those recently reported in literature with similar studies.
An operating system for future aerospace vehicle computer systems

NASA Technical Reports Server (NTRS)

Foudriat, E. C.; Berman, W. J.; Will, R. W.; Bynum, W. L.

1984-01-01

The requirements for future aerospace vehicle computer operating systems are examined in this paper. The computer architecture is assumed to be distributed with a local area network connecting the nodes. Each node is assumed to provide a specific functionality. The network provides for communication so that the overall tasks of the vehicle are accomplished. The O/S structure is based upon the concept of objects. The mechanisms for integrating node unique objects with node common objects in order to implement both the autonomy and the cooperation between nodes is developed. The requirements for time critical performance and reliability and recovery are discussed. Time critical performance impacts all parts of the distributed operating system; e.g., its structure, the functional design of its objects, the language structure, etc. Throughout the paper the tradeoffs - concurrency, language structure, object recovery, binding, file structure, communication protocol, programmer freedom, etc. - are considered to arrive at a feasible, maximum performance design. Reliability of the network system is considered. A parallel multipath bus structure is proposed for the control of delivery time for time critical messages. The architecture also supports immediate recovery for the time critical message system after a communication failure.
Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks

PubMed Central

Piraveenan, Mahendra; Prokopenko, Mikhail; Hossain, Liaquat

2013-01-01

A number of centrality measures are available to determine the relative importance of a node in a complex network, and betweenness is prominent among them. However, the existing centrality measures are not adequate in network percolation scenarios (such as during infection transmission in a social network of individuals, spreading of computer viruses on computer networks, or transmission of disease over a network of towns) because they do not account for the changing percolation states of individual nodes. We propose a new measure, percolation centrality, that quantifies relative impact of nodes based on their topological connectivity, as well as their percolation states. The measure can be extended to include random walk based definitions, and its computational complexity is shown to be of the same order as that of betweenness centrality. We demonstrate the usage of percolation centrality by applying it to a canonical network as well as simulated and real world scale-free and random networks. PMID:23349699
Design and construction of a 2D metal organic framework with multiple cavities: a nonregular net with a paracyclophane that codes for multiply fused nodes.

PubMed

Papaefstathiou, Giannis S; Friscić, Tomislav; MacGillivray, Leonard R

2005-10-19

A metal organic framework with two different nodes (circle and square) and a structure related to one of the 20 known 2-uniform nets has been constructed using an organic building unit that codes for multiply fused nodes.
Introducing the slime mold graph repository

NASA Astrophysics Data System (ADS)

Dirnberger, M.; Mehlhorn, K.; Mehlhorn, T.

2017-07-01

We introduce the slime mold graph repository or SMGR, a novel data collection promoting the visibility, accessibility and reuse of experimental data revolving around network-forming slime molds. By making data readily available to researchers across multiple disciplines, the SMGR promotes novel research as well as the reproduction of original results. While SMGR data may take various forms, we stress the importance of graph representations of slime mold networks due to their ease of handling and their large potential for reuse. Data added to the SMGR stands to gain impact beyond initial publications or even beyond its domain of origin. We initiate the SMGR with the comprehensive Kist Europe data set focusing on the slime mold Physarum polycephalum, which we obtained in the course of our original research. It contains sequences of images documenting growth and network formation of the organism under constant conditions. Suitable image sequences depicting the typical P. polycephalum network structures are used to compute sequences of graphs faithfully capturing them. Given such sequences, node identities are computed, tracking the development of nodes over time. Based on this information we demonstrate two out of many possible ways to begin exploring the data. The entire data set is well-documented, self-contained and ready for inspection at http://smgr.mpi-inf.mpg.de.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Mobile clusters of single board computers: an option for providing resources to student projects and researchers.

PubMed

Baun, Christian

2016-01-01

Clusters usually consist of servers, workstations or personal computers as nodes. But especially for academic purposes like student projects or scientific projects, the cost for purchase and operation can be a challenge. Single board computers cannot compete with the performance or energy-efficiency of higher-value systems, but they are an option to build inexpensive cluster systems. Because of the compact design and modest energy consumption, it is possible to build clusters of single board computers in a way that they are mobile and can be easily transported by the users. This paper describes the construction of such a cluster, useful applications and the performance of the single nodes. Furthermore, the clusters' performance and energy-efficiency is analyzed by executing the High Performance Linpack benchmark with a different number of nodes and different proportion of the systems total main memory utilized.
Scheduling based on a dynamic resource connection

NASA Astrophysics Data System (ADS)

Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.

2017-02-01

The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.
Masked Proportional Routing

NASA Technical Reports Server (NTRS)

Wolpert, David H. (Inventor)

2003-01-01

Distributed approach for determining a path connecting adjacent network nodes, for probabilistically or deterministically transporting an entity, with entity characteristic mu from a source node to a destination node. Each node i is directly connected to an arbitrary number J(mu) of nodes, labeled or numbered j=jl, j2, .... jJ(mu). In a deterministic version, a J(mu)-component baseline proportion vector p(i;mu) is associated with node i. A J(mu)-component applied proportion vector p*(i;mu) is determined from p(i;mu) to preclude an entity visiting a node more than once. Third and fourth J(mu)-component vectors, with components iteratively determined by Target(i;n(mu);mu),=alpha(mu).Target(i;n(mu)-1;mu)j+beta(mu).p* (i;mu)j and Actual(i;n(mu);+a(mu)j. Actual(i;n(mu)-l;mu)j+beta(mu).Sent(i;j'(mu);n(mu)-1;mu)j, are computed, where n(mu) is an entity sequence index and alpha(mu) and beta(mu) are selected numbers. In one embodiment, at each node i, the node j=j'(mu) with the largest vector component difference, Target(i;n(mu);mu)j'- Actual (i;n(mu);mu)j'. is chosen for the next link for entity transport, except in special gap circumstances, where the same link is optionally used for transporting consecutively arriving entities. The network nodes may be computer-controlled routers that switch collections of packets, frames, cells or other information units. Alternatively, the nodes may be waypoints for movement of physical items in a network or for transformation of a physical item. The nodes may be states of an entity undergoing state transitions, where allowed transitions are specified by the network and/or the destination node.
Design of QoS-Aware Multi-Level MAC-Layer for Wireless Body Area Network.

PubMed

Hu, Long; Zhang, Yin; Feng, Dakui; Hassan, Mohammad Mehedi; Alelaiwi, Abdulhameed; Alamri, Atif

2015-12-01

With the advances in wearable computing and various wireless technologies, there is an increasing trend to outsource body signals from wireless body area network (WBAN) to outside world including cyber space, healthcare big data clouds, etc. Since the environmental and physiological data collected by multimodal sensors have different importance, the provisioning of quality of service (QoS) for the sensory data in WBAN is a critical issue. This paper proposes multiple level-based QoS design at WBAN media access control layer in terms of user level, data level and time level. In the proposed QoS provisioning scheme, different users have different priorities, various sensory data collected by different sensor nodes have different importance, while data priority for the same sensor node varies over time. The experimental results show that the proposed multi-level based QoS provisioning solution in WBAN yields better performance for meeting QoS requirements of personalized healthcare applications while achieving energy saving.

Spectrum of walk matrix for Koch network and its application

NASA Astrophysics Data System (ADS)

Xie, Pinchen; Lin, Yuan; Zhang, Zhongzhi

2015-06-01

Various structural and dynamical properties of a network are encoded in the eigenvalues of walk matrix describing random walks on the network. In this paper, we study the spectra of walk matrix of the Koch network, which displays the prominent scale-free and small-world features. Utilizing the particular architecture of the network, we obtain all the eigenvalues and their corresponding multiplicities. Based on the link between the eigenvalues of walk matrix and random target access time defined as the expected time for a walker going from an arbitrary node to another one selected randomly according to the steady-state distribution, we then derive an explicit solution to the random target access time for random walks on the Koch network. Finally, we corroborate our computation for the eigenvalues by enumerating spanning trees in the Koch network, using the connection governing eigenvalues and spanning trees, where a spanning tree of a network is a subgraph of the network, that is, a tree containing all the nodes.
A wirelessly programmable actuation and sensing system for structural health monitoring

NASA Astrophysics Data System (ADS)

Long, James; Büyüköztürk, Oral

2016-04-01

Wireless sensor networks promise to deliver low cost, low power and massively distributed systems for structural health monitoring. A key component of these systems, particularly when sampling rates are high, is the capability to process data within the network. Although progress has been made towards this vision, it remains a difficult task to develop and program 'smart' wireless sensing applications. In this paper we present a system which allows data acquisition and computational tasks to be specified in Python, a high level programming language, and executed within the sensor network. Key features of this system include the ability to execute custom application code without firmware updates, to run multiple users' requests concurrently and to conserve power through adjustable sleep settings. Specific examples of sensor node tasks are given to demonstrate the features of this system in the context of structural health monitoring. The system comprises of individual firmware for nodes in the wireless sensor network, and a gateway server and web application through which users can remotely submit their requests.
High Performance Data Transfer for Distributed Data Intensive Sciences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fang, Chin; Cottrell, R 'Les' A.; Hanushevsky, Andrew B.

We report on the development of ZX software providing high performance data transfer and encryption. The design scales in: computation power, network interfaces, and IOPS while carefully balancing the available resources. Two U.S. patent-pending algorithms help tackle data sets containing lots of small files and very large files, and provide insensitivity to network latency. It has a cluster-oriented architecture, using peer-to-peer technologies to ease deployment, operation, usage, and resource discovery. Its unique optimizations enable effective use of flash memory. Using a pair of existing data transfer nodes at SLAC and NERSC, we compared its performance to that of bbcp andmore » GridFTP and determined that they were comparable. With a proof of concept created using two four-node clusters with multiple distributed multi-core CPUs, network interfaces and flash memory, we achieved 155Gbps memory-to-memory over a 2x100Gbps link aggregated channel and 70Gbps file-to-file with encryption over a 5000 mile 100Gbps link.« less
Reticular synthesis of porous molecular 1D nanotubes and 3D networks.

PubMed

Slater, A G; Little, M A; Pulido, A; Chong, S Y; Holden, D; Chen, L; Morgan, C; Wu, X; Cheng, G; Clowes, R; Briggs, M E; Hasell, T; Jelfs, K E; Day, G M; Cooper, A I

2017-01-01

Synthetic control over pore size and pore connectivity is the crowning achievement for porous metal-organic frameworks (MOFs). The same level of control has not been achieved for molecular crystals, which are not defined by strong, directional intermolecular coordination bonds. Hence, molecular crystallization is inherently less controllable than framework crystallization, and there are fewer examples of 'reticular synthesis', in which multiple building blocks can be assembled according to a common assembly motif. Here we apply a chiral recognition strategy to a new family of tubular covalent cages to create both 1D porous nanotubes and 3D diamondoid pillared porous networks. The diamondoid networks are analogous to MOFs prepared from tetrahedral metal nodes and linear ditopic organic linkers. The crystal structures can be rationalized by computational lattice-energy searches, which provide an in silico screening method to evaluate candidate molecular building blocks. These results are a blueprint for applying the 'node and strut' principles of reticular synthesis to molecular crystals.
Reticular synthesis of porous molecular 1D nanotubes and 3D networks

NASA Astrophysics Data System (ADS)

Slater, A. G.; Little, M. A.; Pulido, A.; Chong, S. Y.; Holden, D.; Chen, L.; Morgan, C.; Wu, X.; Cheng, G.; Clowes, R.; Briggs, M. E.; Hasell, T.; Jelfs, K. E.; Day, G. M.; Cooper, A. I.

2017-01-01

Synthetic control over pore size and pore connectivity is the crowning achievement for porous metal-organic frameworks (MOFs). The same level of control has not been achieved for molecular crystals, which are not defined by strong, directional intermolecular coordination bonds. Hence, molecular crystallization is inherently less controllable than framework crystallization, and there are fewer examples of 'reticular synthesis', in which multiple building blocks can be assembled according to a common assembly motif. Here we apply a chiral recognition strategy to a new family of tubular covalent cages to create both 1D porous nanotubes and 3D diamondoid pillared porous networks. The diamondoid networks are analogous to MOFs prepared from tetrahedral metal nodes and linear ditopic organic linkers. The crystal structures can be rationalized by computational lattice-energy searches, which provide an in silico screening method to evaluate candidate molecular building blocks. These results are a blueprint for applying the 'node and strut' principles of reticular synthesis to molecular crystals.
A game theory approach to target tracking in sensor networks.

PubMed

Gu, Dongbing

2011-02-01

In this paper, we investigate a moving-target tracking problem with sensor networks. Each sensor node has a sensor to observe the target and a processor to estimate the target position. It also has wireless communication capability but with limited range and can only communicate with neighbors. The moving target is assumed to be an intelligent agent, which is "smart" enough to escape from the detection by maximizing the estimation error. This adversary behavior makes the target tracking problem more difficult. We formulate this target estimation problem as a zero-sum game in this paper and use a minimax filter to estimate the target position. The minimax filter is a robust filter that minimizes the estimation error by considering the worst case noise. Furthermore, we develop a distributed version of the minimax filter for multiple sensor nodes. The distributed computation is implemented via modeling the information received from neighbors as measurements in the minimax filter. The simulation results show that the target tracking algorithm proposed in this paper provides a satisfactory result.
Send-side matching of data communications messages

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-07-01

Send-side matching of data communications messages includes a plurality of compute nodes organized for collective operations, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message.
Detection of lymph node metastases in pediatric and adolescent/young adult sarcoma: Sentinel lymph node biopsy versus fludeoxyglucose positron emission tomography imaging-A prospective trial.

PubMed

Wagner, Lars M; Kremer, Nathalie; Gelfand, Michael J; Sharp, Susan E; Turpin, Brian K; Nagarajan, Rajaram; Tiao, Gregory M; Pressey, Joseph G; Yin, Julie; Dasgupta, Roshni

2017-01-01

Lymph node metastases are an important cause of treatment failure for pediatric and adolescent/young adult (AYA) sarcoma patients. Nodal sampling is recommended for certain sarcoma subtypes that have a predilection for lymphatic spread. Sentinel lymph node biopsy (SLNB) may improve the diagnostic yield of nodal sampling, particularly when single-photon emission computed tomography/computed tomography (SPECT-CT) is used to facilitate anatomic localization. Functional imaging with positron emission tomography/computed tomography (PET-CT) is increasingly used for sarcoma staging and is a less invasive alternative to SLNB. To assess the utility of these 2 staging methods, this study prospectively compared SLNB plus SPECT-CT with PET-CT for the identification of nodal metastases in pediatric and AYA patients. Twenty-eight pediatric and AYA sarcoma patients underwent SLNB with SPECT-CT. The histological findings of the excised lymph nodes were then correlated with preoperative PET-CT imaging. A median of 2.4 sentinel nodes were sampled per patient. No wound infections or chronic lymphedema occurred. SLNB identified tumors in 7 of the 28 patients (25%), including 3 patients who had normal PET-CT imaging of the nodal basin. In contrast, PET-CT demonstrated hypermetabolic regional nodes in 14 patients, and this resulted in a positive predictive value of only 29%. The sensitivity and specificity of PET-CT for detecting histologically confirmed nodal metastases were only 57% and 52%, respectively. SLNB can safely guide the rational selection of nodes for biopsy in pediatric and AYA sarcoma patients and can identify therapy-changing nodal disease not appreciated with PET-CT. Cancer 2017;155-160. © 2016 American Cancer Society. © 2016 American Cancer Society.
Potential advantage of studying the lymphatic drainage by sentinel node technique and SPECT-CT image fusion for pelvic irradiation of prostate cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krengli, Marco; Ballare, Andrea; Cannillo, Barbara

2006-11-15

Purpose: This study aims to investigate the in vivo drainage of lymphatic spread by using the sentinel node (SN) technique and single-photon emission computed tomography (SPECT)-computed tomography (CT) image fusion, and to analyze the impact of such information on conformal pelvic irradiation. Methods and Materials: Twenty-three prostate cancer patients, candidates for radical prostatectomy already included in a trial studying the SN technique, were enrolled. CT and SPECT images were obtained after intraprostate injection of 115 MBq of {sup 99m}Tc-nanocolloid, allowing identification of SN and other pelvic lymph nodes. Target and nontarget structures, including lymph nodes identified by SPECT, were drawnmore » on SPECT-CT fusion images. A three-dimensional conformal treatment plan was performed for each patient. Results: Single-photon emission computed tomography lymph nodal uptake was detected in 20 of 23 cases (87%). The SN was inside the pelvic clinical target volume (CTV{sub 2}) in 16 of 20 cases (80%) and received no less than the prescribed dose in 17 of 20 cases (85%). The most frequent locations of SN outside the CTV{sub 2} were the common iliac and presacral lymph nodes. Sixteen of the 32 other lymph nodes (50%) identified by SPECT were found outside the CTV{sub 2}. Overall, the SN and other intrapelvic lymph nodes identified by SPECT were not included in the CTV{sub 2} in 5 of 20 (25%) patients. Conclusions: The study of lymphatic drainage can contribute to a better knowledge of the in vivo potential pattern of lymph node metastasis in prostate cancer and can lead to a modification of treatment volume with consequent optimization of pelvic irradiation.« less
Predicting axillary lymph node metastasis from kinetic statistics of DCE-MRI breast images

NASA Astrophysics Data System (ADS)

Ashraf, Ahmed B.; Lin, Lilie; Gavenonis, Sara C.; Mies, Carolyn; Xanthopoulos, Eric; Kontos, Despina

2012-03-01

The presence of axillary lymph node metastases is the most important prognostic factor in breast cancer and can influence the selection of adjuvant therapy, both chemotherapy and radiotherapy. In this work we present a set of kinetic statistics derived from DCE-MRI for predicting axillary node status. Breast DCE-MRI images from 69 women with known nodal status were analyzed retrospectively under HIPAA and IRB approval. Axillary lymph nodes were positive in 12 patients while 57 patients had no axillary lymph node involvement. Kinetic curves for each pixel were computed and a pixel-wise map of time-to-peak (TTP) was obtained. Pixels were first partitioned according to the similarity of their kinetic behavior, based on TTP values. For every kinetic curve, the following pixel-wise features were computed: peak enhancement (PE), wash-in-slope (WIS), wash-out-slope (WOS). Partition-wise statistics for every feature map were calculated, resulting in a total of 21 kinetic statistic features. ANOVA analysis was done to select features that differ significantly between node positive and node negative women. Using the computed kinetic statistic features a leave-one-out SVM classifier was learned that performs with AUC=0.77 under the ROC curve, outperforming the conventional kinetic measures, including maximum peak enhancement (MPE) and signal enhancement ratio (SER), (AUCs of 0.61 and 0.57 respectively). These findings suggest that our DCE-MRI kinetic statistic features can be used to improve the prediction of axillary node status in breast cancer patients. Such features could ultimately be used as imaging biomarkers to guide personalized treatment choices for women diagnosed with breast cancer.
Cross-ontological analytics for alignment of different classification schemes

DOEpatents

Posse, Christian; Sanfilippo, Antonio P; Gopalan, Banu; Riensche, Roderick M; Baddeley, Robert L

2010-09-28

Quantification of the similarity between nodes in multiple electronic classification schemes is provided by automatically identifying relationships and similarities between nodes within and across the electronic classification schemes. Quantifying the similarity between a first node in a first electronic classification scheme and a second node in a second electronic classification scheme involves finding a third node in the first electronic classification scheme, wherein a first product value of an inter-scheme similarity value between the second and third nodes and an intra-scheme similarity value between the first and third nodes is a maximum. A fourth node in the second electronic classification scheme can be found, wherein a second product value of an inter-scheme similarity value between the first and fourth nodes and an intra-scheme similarity value between the second and fourth nodes is a maximum. The maximum between the first and second product values represents a measure of similarity between the first and second nodes.
Feasibility of optically interconnected parallel processors using wavelength division multiplexing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deri, R.J.; De Groot, A.J.; Haigh, R.E.

1996-03-01

New national security demands require enhanced computing systems for nearly ab initio simulations of extremely complex systems and analyzing unprecedented quantities of remote sensing data. This computational performance is being sought using parallel processing systems, in which many less powerful processors are ganged together to achieve high aggregate performance. Such systems require increased capability to communicate information between individual processor and memory elements. As it is likely that the limited performance of today`s electronic interconnects will prevent the system from achieving its ultimate performance, there is great interest in using fiber optic technology to improve interconnect communication. However, little informationmore » is available to quantify the requirements on fiber optical hardware technology for this application. Furthermore, we have sought to explore interconnect architectures that use the complete communication richness of the optical domain rather than using optics as a simple replacement for electronic interconnects. These considerations have led us to study the performance of a moderate size parallel processor with optical interconnects using multiple optical wavelengths. We quantify the bandwidth, latency, and concurrency requirements which allow a bus-type interconnect to achieve scalable computing performance using up to 256 nodes, each operating at GFLOP performance. Our key conclusion is that scalable performance, to {approx}150 GFLOPS, is achievable for several scientific codes using an optical bus with a small number of WDM channels (8 to 32), only one WDM channel received per node, and achievable optoelectronic bandwidth and latency requirements. 21 refs. , 10 figs.« less
Locality Aware Concurrent Start for Stencil Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
Intricacies of modern supercomputing illustrated with recent advances in simulations of strongly correlated electron systems

NASA Astrophysics Data System (ADS)

Schulthess, Thomas C.

2013-03-01

The continued thousand-fold improvement in sustained application performance per decade on modern supercomputers keeps opening new opportunities for scientific simulations. But supercomputers have become very complex machines, built with thousands or tens of thousands of complex nodes consisting of multiple CPU cores or, most recently, a combination of CPU and GPU processors. Efficient simulations on such high-end computing systems require tailored algorithms that optimally map numerical methods to particular architectures. These intricacies will be illustrated with simulations of strongly correlated electron systems, where the development of quantum cluster methods, Monte Carlo techniques, as well as their optimal implementation by means of algorithms with improved data locality and high arithmetic density have gone hand in hand with evolving computer architectures. The present work would not have been possible without continued access to computing resources at the National Center for Computational Science of Oak Ridge National Laboratory, which is funded by the Facilities Division of the Office of Advanced Scientific Computing Research, and the Swiss National Supercomputing Center (CSCS) that is funded by ETH Zurich.
Towards a cyber-physical era: soft computing framework based multi-sensor array for water quality monitoring

NASA Astrophysics Data System (ADS)

Bhardwaj, Jyotirmoy; Gupta, Karunesh K.; Gupta, Rajiv

2018-02-01

New concepts and techniques are replacing traditional methods of water quality parameter measurement systems. This paper introduces a cyber-physical system (CPS) approach for water quality assessment in a distribution network. Cyber-physical systems with embedded sensors, processors and actuators can be designed to sense and interact with the water environment. The proposed CPS is comprised of sensing framework integrated with five different water quality parameter sensor nodes and soft computing framework for computational modelling. Soft computing framework utilizes the applications of Python for user interface and fuzzy sciences for decision making. Introduction of multiple sensors in a water distribution network generates a huge number of data matrices, which are sometimes highly complex, difficult to understand and convoluted for effective decision making. Therefore, the proposed system framework also intends to simplify the complexity of obtained sensor data matrices and to support decision making for water engineers through a soft computing framework. The target of this proposed research is to provide a simple and efficient method to identify and detect presence of contamination in a water distribution network using applications of CPS.
On the Performance Evaluation of a MIMO–WCDMA Transmission Architecture for Building Management Systems

PubMed Central

Tsampasis, Eleftherios; Gkonis, Panagiotis K.; Trakadas, Panagiotis; Zahariadis, Theodοre

2018-01-01

The goal of this study was to investigate the performance of a realistic wireless sensor nodes deployment in order to support modern building management systems (BMSs). A three-floor building orientation is taken into account, where each node is equipped with a multi-antenna system while a central base station (BS) collects and processes all received information. The BS is also equipped with multiple antennas; hence, a multiple input–multiple output (MIMO) system is formulated. Due to the multiple reflections during transmission in the inner of the building, a wideband code division multiple access (WCDMA) physical layer protocol has been considered, which has already been adopted for third-generation (3G) mobile networks. Results are presented for various MIMO orientations, where the mean transmission power per node is considered as an output metric for a specific signal-to-noise ratio (SNR) requirement and number of resolvable multipath components. In the first set of presented results, the effects of multiple access interference on overall transmission power are highlighted. As the number of mobile nodes per floor or the requested transmission rate increases, MIMO systems of a higher order should be deployed in order to maintain transmission power at adequate levels. In the second set of results, a comparison is performed among transmission in diversity combining and spatial multiplexing mode, which clearly indicate that the first case is the most appropriate solution for indoor communications. PMID:29316720
GATE Monte Carlo simulation of dose distribution using MapReduce in a cloud computing environment.

PubMed

Liu, Yangchuan; Tang, Yuguo; Gao, Xin

2017-12-01

The GATE Monte Carlo simulation platform has good application prospects of treatment planning and quality assurance. However, accurate dose calculation using GATE is time consuming. The purpose of this study is to implement a novel cloud computing method for accurate GATE Monte Carlo simulation of dose distribution using MapReduce. An Amazon Machine Image installed with Hadoop and GATE is created to set up Hadoop clusters on Amazon Elastic Compute Cloud (EC2). Macros, the input files for GATE, are split into a number of self-contained sub-macros. Through Hadoop Streaming, the sub-macros are executed by GATE in Map tasks and the sub-results are aggregated into final outputs in Reduce tasks. As an evaluation, GATE simulations were performed in a cubical water phantom for X-ray photons of 6 and 18 MeV. The parallel simulation on the cloud computing platform is as accurate as the single-threaded simulation on a local server and the simulation correctness is not affected by the failure of some worker nodes. The cloud-based simulation time is approximately inversely proportional to the number of worker nodes. For the simulation of 10 million photons on a cluster with 64 worker nodes, time decreases of 41× and 32× were achieved compared to the single worker node case and the single-threaded case, respectively. The test of Hadoop's fault tolerance showed that the simulation correctness was not affected by the failure of some worker nodes. The results verify that the proposed method provides a feasible cloud computing solution for GATE.
Scalable cloud without dedicated storage

NASA Astrophysics Data System (ADS)

Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.

2015-05-01

We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dubois, David H; Dubois, Andrew J; Boorman, Thomas M

2009-01-01

This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dubois, David H; Dubois, Andrew J; Boorman, Thomas M

2009-03-10

This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.