Distributed Monte Carlo production for DZero
NASA Astrophysics Data System (ADS)
Snow, Joel; DØ Collaboration
2010-04-01
The DZero collaboration uses a variety of resources on four continents to pursue a strategy of flexibility and automation in the generation of simulation data. This strategy provides a resilient and opportunistic system which ensures an adequate and timely supply of simulation data to support DZero's physics analyses. A mixture of facilities, dedicated and opportunistic, specialized and generic, large and small, grid job enabled and not, are used to provide a production system that has adapted to newly developing technologies. This strategy has increased the event production rate by a factor of seven and the data production rate by a factor of ten in the last three years despite diminishing manpower. Common to all production facilities is the SAM (Sequential Access to Metadata) data-grid. Job submission to the grid uses SAMGrid middleware which may forward jobs to the OSG, the WLCG, or native SAMGrid sites. The distributed computing and data handling system used by DZero will be described and the results of MC production since the deployment of grid technologies will be presented.
The DZERO Level 3 Data Acquisition System
NASA Astrophysics Data System (ADS)
Angstadt, R.; Brooijmans, G.; Chapin, D.; Clements, M.; Cutts, D.; Haas, A.; Hauser, R.; Johnson, M.; Kulyavtsev, A.; Mattingly, S. E. K.; Mulders, M.; Padley, P.; Petravick, D.; Rechenmacher, R.; Snyder, S.; Watts, G.
2004-06-01
The DZERO experiment began RunII datataking operation at Fermilab in spring 2001. The physics program of the experiment requires the Level 3 data acquisition (DAQ) system system to handle average event sizes of 250 kilobytes at a rate of 1 kHz. The system routes and transfers event fragments of approximately 1-20 kilobytes from 63 VME crate sources to any of approximately 100 processing nodes. It is built upon a Cisco 6509 Ethernet switch, standard PCs, and commodity VME single board computers (SBCs). The system has been in full operation since spring 2002.
Workshop on data acquisition and trigger system simulations for high energy physics
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1992-12-31
This report discusses the following topics: DAQSIM: A data acquisition system simulation tool; Front end and DCC Simulations for the SDC Straw Tube System; Simulation of Non-Blocklng Data Acquisition Architectures; Simulation Studies of the SDC Data Collection Chip; Correlation Studies of the Data Collection Circuit & The Design of a Queue for this Circuit; Fast Data Compression & Transmission from a Silicon Strip Wafer; Simulation of SCI Protocols in Modsim; Visual Design with vVHDL; Stochastic Simulation of Asynchronous Buffers; SDC Trigger Simulations; Trigger Rates, DAQ & Online Processing at the SSC; Planned Enhancements to MODSEM II & SIMOBJECT -- anmore » Overview -- R.; DAGAR -- A synthesis system; Proposed Silicon Compiler for Physics Applications; Timed -- LOTOS in a PROLOG Environment: an Algebraic language for Simulation; Modeling and Simulation of an Event Builder for High Energy Physics Data Acquisition Systems; A Verilog Simulation for the CDF DAQ; Simulation to Design with Verilog; The DZero Data Acquisition System: Model and Measurements; DZero Trigger Level 1.5 Modeling; Strategies Optimizing Data Load in the DZero Triggers; Simulation of the DZero Level 2 Data Acquisition System; A Fast Method for Calculating DZero Level 1 Jet Trigger Properties and Physics Input to DAQ Studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snow, Dr., Joel
This final report is presented by Langston University (LU) for the project entitled "Langston University High Energy Physics" (LUHEP) under the direction of principal investigator (PI) and project director Professor Joel Snow. The project encompassed high energy physics research performed at hadron colliders. The PI is a collaborator on the DZero experiment at Fermi National Accelerator Laboratory in Batavia, IL, USA and the ATLAS experiment at CERN in Geneva, Switzerland and was during the entire project period from April 1, 1999 until May 14, 2012. Both experiments seek to understand the fundamental constituents of the physical universe and the forcesmore » that govern their interactions. In 1999 as member of the Online Systems group for Run 2 the PI developed a cross-platform Python-based, Graphical User Interface (GUI) application for monitoring and control of EPICS based devices for control room use. This served as a model for other developers to enhance and build on for further monitoring and control tasks written in Python. Subsequently the PI created and developed a cross-platform C++ GUI utilizing a networked client-server paradigm and based on ROOT, the object oriented analysis framework from CERN. The GUI served as a user interface to the Examine tasks running in the D\\O\\ control room which monitored the status and integrity of data taking for Run 2. The PI developed the histogram server/control interface to the GUI client for the EXAMINE processes. The histogram server was built from the ROOT framework and was integrated into the D\\O\\ framework used for online monitoring programs and offline analysis. The PI developed the first implementation of displaying histograms dynamically generated by ROOT in a Web Browser. The PI's work resulted in several talks and papers at international conferences and workshops. The PI established computing software infrastructure at LU and U. Oklahoma (OU) to do analysis of DZero production data and produce simulation data for the experiment. Eventually this included the FNAL SAM data grid system, the SAMGrid (SG) infrastructure, and the Open Science Grid software stacks for computing and storage elements. At the end of 2003 Snow took on the role of global Monte Carlo production coordinator for the DØ experiment. A role which continues til this day. In January of 2004 Snow started working with the SAMGrid development team to help debug, deploy, and integrate SAMGrid with DØ Monte Carlo production. Snow installed and configured SG execution and client sites at LUHEP and OUHEP, and a SG scheduler site at LUHEP. The PI developed a python based GUI (DAJ) that acts as a front end for job submission to SAMGrid. The GUI interfaces to the DZero Mone Carlo (MC) request system that uses SAM to manage MC requests by the physics analysis groups. DAJ significantly simplified SG job submission and was deployed in DZero in an effort to increase the user base of SG. The following year was the advent of SAMGrid job submission to the Open Science Grid (OSG) and LHC Computing Grid (LCG) through a forwarding mechanism. The PI oversaw the integration of these grids into the existing production infrastructure. The PI developed an automatic MC (Automc) request processing system capable of operating without user intervention (other than getting grid credentials), and able to submit to any number of sites on various grids. The system manages production at all but 2 sites. The system was deployed at Fermilab and remains operating there today. The PI's work in distributed computing resulted in several talks at international conferences. UTA, OU, and LU were chosen as the collaborating institutions that form the Southwest Tier 2 Center (SWT2) for ATLAS. During the project period the PI contributed to the online and offline software infrastructure through his work with the Run 2 online group, and played a major role in Monte Carlo production for DZero. During the part of the project period in which the PI served as MC production coordinator MC production increased very significantly. In the first year of the PI's tenure as production coordinator production was 159M events and 6.7~TB of data. During the last year of the project period production was 2,342~M events and 262~TB of data. That is a factor of 15 increase in events and 39 in data volume. The increase occurred with improvements in computer hardware and networks, through the use of grid technology on diverse resources, and through increased automation and efficiency of the production process. LU HEP developed and deployed the automatic MC request processing system in use at FNAL. The complementary strategies of automation and grid production served DZero well. Fermilab has recognized LU HEP's contribution to DZero by allowing the PI to devote full time to research activities by appointing him a guest scientist for the last six years of the project period.« less
The Luminosity Measurement for the DZERO Experiment at Fermilab
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snow, Gregory R.
Primary project objective: The addition of University of Nebraska-Lincoln (UNL) human resources supported by this grant helped ensure that Fermilab’s DZERO experiment had a reliable luminosity measurement through the end of Run II data taking and an easily-accessible repository of luminosity information for all collaborators performing physics analyses through the publication of its final physics results. Secondary project objective: The collaboration between the UNL Instrument Shop and Fermilab’s Scintillation Detector Development Center enhanced the University of Nebraska’s future role as a particle detector R&D and production facility for future high energy physics experiments. Overall project objective: This targeted project enhancedmore » the University of Nebraska’s presence in both frontier high energy physics research in DZERO and particle detector development, and it thereby served the goals of the DOE Office of Science and the Experimental Program to Stimulate Competitive Research (EPSCoR) for the state of Nebraska.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, B.; /Fermilab
1999-10-08
A user interface is created to monitor and operate the heating, ventilation, and air conditioning system. The interface is networked to the system's programmable logic controller. The controller maintains automated control of the system. The user through the interface is able to see the status of the system and override or adjust the automatic control features. The interface is programmed to show digital readouts of system equipment as well as visual queues of system operational statuses. It also provides information for system design and component interaction. The interface is made easier to read by simple designs, color coordination, and graphics.more » Fermi National Accelerator Laboratory (Fermi lab) conducts high energy particle physics research. Part of this research involves collision experiments with protons, and anti-protons. These interactions are contained within one of two massive detectors along Fermilab's largest particle accelerator the Tevatron. The D-Zero Assembly Building houses one of these detectors. At this time detector systems are being upgraded for a second experiment run, titled Run II. Unlike the previous run, systems at D-Zero must be computer automated so operators do not have to continually monitor and adjust these systems during the run. Human intervention should only be necessary for system start up and shut down, and equipment failure. Part of this upgrade includes the heating, ventilation, and air conditioning system (HVAC system). The HVAC system is responsible for controlling two subsystems, the air temperatures of the D-Zero Assembly Building and associated collision hall, as well as six separate water systems used in the heating and cooling of the air and detector components. The BYAC system is automated by a programmable logic controller. In order to provide system monitoring and operator control a user interface is required. This paper will address methods and strategies used to design and implement an effective user interface. Background material pertinent to the BYAC system will cover the separate water and air subsystems and their purposes. In addition programming and system automation will also be covered.« less
NASA Astrophysics Data System (ADS)
Hegab, Hatim H.
In this dissertation, results from a search for the Standard Model (SM) Higgs boson, at the DZERO experiment is shown. The SM is the theoretical framework which describes particles of matter and force carrier gauge bosons. To solve the mass problem in the SM, the Higgs mechanism was introduced. The Higgs mechanism causes an electroweak symmetry breaking and a new massive scalar boson was postulated. This particle is the Higgs boson. A search for the Higgs boson has been ongoing at the Tevatron where protons and antiprotons were allowed to collide at a center-of-mass energy of 1.96 TeV. For a low mass Higgs that is lower than 135 GeV, the dominant decay mode is Higgs to a pair of b-quarks. Work in this dissertation concentrated on a Higgs in the mass range of 100 - 150 GeV, where a W vector boson is produced in association with the Higgs boson. The final state chosen is one which contains a lepton (electron or a muon) a neutrino and a pair of b-quarks. This study used data provided by the DZERO experiment and computing resources provided by Fermilab. Results presented here are the outcome of analyzing 5.3 inverse-fb of data from RunII period. The analysis used different techniques to increase the sensitivity of the study. Data were subdivided based on lepton flavor, number of jets in sample, jets identified as b -jets and dates of collected data. A multivariate analysis technique based on boosted decision trees were used to separate signal from background processes, physical and instrumental. A good agreement between data and simulated events was observed. An observed (expected) upper limit of 4.5 (4.8) for a Higgs of mass 115 GeV was set on the ratio of the Higgs production to its decay branching ratio at the 95% confidence level.
Measuring the Mass of the W Boson with the Last 3.7 fb -1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brochmann, Michelle
This thesis presents the results of an analysis of the 3.7 fb -1 of Tevatron proton-antiproton data collected with the DZero (D0) Detector at Fermilab during the "RunIIb34" period, with the goal of extracting an improved measurement of the W boson mass, which is currently measured to a precision of ≈ 20 MeV.
The OSG open facility: A sharing ecosystem
Jayatilaka, B.; Levshina, T.; Rynge, M.; ...
2015-12-23
The Open Science Grid (OSG) ties together individual experiments’ computing power, connecting their resources to create a large, robust computing grid, this computing infrastructure started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero. In the years since, the OSG has broadened its focus to also address the needs of other US researchers and increased delivery of Distributed High Through-put Computing (DHTC) to users from a wide variety of disciplines via the OSG Open Facility. Presently, the Open Facility delivers about 100 million computing wall hours per year to researchers whomore » are not already associated with the owners of the computing sites, this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the sites in the OSG. Using these methods, OSG resource providers and scientists share computing hours with researchers in many other fields to enable their science, striving to make sure that these computing power used with maximal efficiency. Furthermore, we believe that expanded access to DHTC is an essential tool for scientific innovation and work continues in expanding this service.« less
for the public to see the wonder of the engineering and the technology that goes into a complex really makes you say, 'Wow! People can really do such complex, amazing, wonderful things.'"
Fermilab Education Office Calendar
Event 3 DZero & Tevatron Tour, 1:30 PM - 3:30 PM, Special Event 9-13 Best Games Ever! Your Own!, 9 Adventure 14 Games, Magic and the Brain, 9:30 AM - 12:30 PM, Science Adventure 15 Get to Know Fermilab
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anzelc, Meghan
2008-06-01
Bmore » $$0\\atop{s}$$ mixing studies provide a precision test of Charge-Parity violation in the Standard Model. A measurement of Δm s constrains elements of the CKM quark rotation matrix [1], providing a probe of Standard Model Charge-Parity violation. This thesis describes a study of $$0\\atop{s}$$ mixing in the semileptonic decay $$0\\atop{s}$$ → D s - μ +vX, where D s - → Φπ -, using data collected at the D-Zero detector at Fermi National Accelerator in atavia, Illinois. Approximately 2.8 fb -1 of data collected between April 2002 and August 2007 was used, covering the entirety of the Tevatron's RunIIa (April 2002 to March 2006) and part of RunIIb (March 2006-August 2007). Taggers using both opposite-side and same-side information were used to obtain the flavor information of the s 0 meson at production. The charge of the muon in the decay $$0\\atop{s}$$ → D s -μ +vX was used to determine the flavor of the $$0\\atop{s}$$ at decay. The $$d\\atop{0}$$ mixing frequency, Δm d, was measured to verify the analysis procedure. A log-likelihood calculation was performed, and a measurement of Δm s was obtained. The final result was Δm s = 18.86 ± 0.80(stat.) ± 0.37(sys.) with a significance of 2.6σ.« less
. . . . of course, they knew Ohm's Law, V=IR, for resistors and wires in a normal environment . . . . but ? Is it still V = IR? Online Resources - Find Out What Happened - Assessment Author: Ken Cecire based on a D-Zero Note by R. Dower (Roxbury Latin School) and Ulrich Heintz (Boston University) Web
Fermilab | Tevatron | Experiments | DZero
Book Newsroom Newsroom News and features Press releases Photo gallery Fact sheets and brochures Media media Video of shutdown event Guest book Tevatron Impact June 11, 2012 About the symposium Symposium Book Fermilab at Work For Industry Jobs Interact Facebook Twitter Instagram Google+ YouTube Flickr
D0 Solenoid Upgrade Project: D0 Solenoid Current Leads
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rucinski, R.; /Fermilab
This engineering note documents information gathered and design decisions made regarding the vapor cooled current leads for the D-Zero Solenoid. The decision was made during design group meetings that the D-Zero Solenoid, rated at 4825 amps, should use vapor cooled current leads rated at 6000 amps. CDF uses 6000 amp leads from American Magnetics Inc. (AMI) and has two spares in their storage lockers. Because of the spares situation and AMI's reputation, AMI would be the natural choice of vendor. The manufacturer's listed helium consumption is 19.2 liters/hr. From experience with these types of leads, more stable operation is acheivedmore » at an increased gas flow. See attached E-Mail message from RLS. We have decided to list the design flow rate at 28.8 liquid liters/hr in the design report. This corresponds to COFs operating point. A question was raised regarding how long the current leads could last at full current should the vapor cooling flow was stopped. This issue was discussed with Scott Smith from AMI. We do not feel that there is a problem for this failure scenario.« less
Applications in Data-Intensive Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shah, Anuj R.; Adkins, Joshua N.; Baxter, Douglas J.
2010-04-01
This book chapter, to be published in Advances in Computers, Volume 78, in 2010 describes applications of data intensive computing (DIC). This is an invited chapter resulting from a previous publication on DIC. This work summarizes efforts coming out of the PNNL's Data Intensive Computing Initiative. Advances in technology have empowered individuals with the ability to generate digital content with mouse clicks and voice commands. Digital pictures, emails, text messages, home videos, audio, and webpages are common examples of digital content that are generated on a regular basis. Data intensive computing facilitates human understanding of complex problems. Data-intensive applications providemore » timely and meaningful analytical results in response to exponentially growing data complexity and associated analysis requirements through the development of new classes of software, algorithms, and hardware.« less
Data intensive computing at Sandia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew T.
2010-09-01
Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
From cosmos to connectomes: the evolution of data-intensive science.
Burns, Randal; Vogelstein, Joshua T; Szalay, Alexander S
2014-09-17
The analysis of data requires computation: originally by hand and more recently by computers. Different models of computing are designed and optimized for different kinds of data. In data-intensive science, the scale and complexity of data exceeds the comfort zone of local data stores on scientific workstations. Thus, cloud computing emerges as the preeminent model, utilizing data centers and high-performance clusters, enabling remote users to access and query subsets of the data efficiently. We examine how data-intensive computational systems originally built for cosmology, the Sloan Digital Sky Survey (SDSS), are now being used in connectomics, at the Open Connectome Project. We list lessons learned and outline the top challenges we expect to face. Success in computational connectomics would drastically reduce the time between idea and discovery, as SDSS did in cosmology. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Guoliang
1997-12-01
This dissertation describes the searches for first generation scalar leptoquarks in the eejj and evjj channels in p/bar p collisions at a center of mass energy of 1.8 TeV using the DO detector at the Fermi National Accelerator Laboratory. Data corresponding to an integrated luminosity of about 100 pb-1 were studied. The number of candidate events in both channels is consistent with the expected yield from Standard Model processes. First generation scalar leptoquarks with mass less than 204 (168) GeV/c2 are excluded for the branching fraction of leptoquarks decaying into electron and quark β = 1.0 (0.5) at the 95% confidence level.
Manufacturing and testing VLPC hybrids
NASA Astrophysics Data System (ADS)
Adkins, L. R.; Ingram, C. M.; Anderson, E. J.
1998-11-01
To insure that the manufacture of VLPC devices is a reliable, cost-effective technology, hybrid assembly procedures and testing methods suitable for large scale production have been developed. This technology has been developed under a contract from Fermilab as part of the D-Zero upgrade program. Each assembled hybrid consists of a VLPC chip mounted on an AlN substrate. The VLPC chip is provided with bonding pads (one connected to each pixel) which are wire bonded to gold traces on the substrate. The VLPC/AlN hybrids are mated in a vacuum sealer using solder preforms and a specially designed carbon boat. After mating, the VLPC pads are bonded to the substrate with an automatic wire bonder. Using this equipment we have achieved a thickness tolerance of ±0.0007 inches and a production rate of 100 parts per hour. After assembly the VLPCs are tested for optical response at an operating temperature of 7K. The parts are tested in a custom designed continuous-flow dewar with a capacity 15 hybrids, and one Lake Shore DT470-SD-11 calibrated temperature sensor mounted to an AlN substrate. Our facility includes five of these dewars with an ultimate test capacity of 75 parts per day. During the course of the Dzero program we have assembled more than 4,000 VLPC hybrids and have tested more than 2,500 with a high yield.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
The OSG Open Facility: an on-ramp for opportunistic scientific computing
NASA Astrophysics Data System (ADS)
Jayatilaka, B.; Levshina, T.; Sehgal, C.; Gardner, R.; Rynge, M.; Würthwein, F.
2017-10-01
The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource owners and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.
The OSG Open Facility: An On-Ramp for Opportunistic Scientific Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jayatilaka, B.; Levshina, T.; Sehgal, C.
The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource ownersmore » and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.« less
MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
2015-01-01
Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement. PMID:26305223
MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce.
Idris, Muhammad; Hussain, Shujaat; Siddiqi, Muhammad Hameed; Hassan, Waseem; Syed Muhammad Bilal, Hafiz; Lee, Sungyoung
2015-01-01
Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.
Search for 1st Generation Leptoquarks in the eejj channel with the DZero experiment (in French)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barfuss, Anne-Fleur
2008-09-12
An evidence of the existence of leptoquarks (LQ) would prove the validity of various extensions of the Standard Model of Particle Physics (SM). The search for first generation leptoquarks presented in this dissertation has been performed by analyzing a 1.02 fb -1 sample of data collected by the D0 detector, events with a final state comprising two light jets and two electrons. The absence of an excess of events in comparison to SM expectations leads to exclude scalar LQ masses up to 292 GeV and vector LQ masses from 350 to 458 GeV, depending on the LQ-l-q coupling type. Themore » great importance of a good jet energy measurement motivated the study of the instrumental backgrounds correlated to the calorimeter, as much as studies of the hadronic showers energy resolution in γ + jets events.« less
PNNL Data-Intensive Computing for a Smarter Energy Grid
Carol Imhoff; Zhenyu (Henry) Huang; Daniel Chavarria
2017-12-09
The Middleware for Data-Intensive Computing (MeDICi) Integration Framework, an integrated platform to solve data analysis and processing needs, supports PNNL research on the U.S. electric power grid. MeDICi is enabling development of visualizations of grid operations and vulnerabilities, with goal of near real-time analysis to aid operators in preventing and mitigating grid failures.
De Georgia, Michael A.; Kaffashi, Farhad; Jacono, Frank J.; Loparo, Kenneth A.
2015-01-01
There is a broad consensus that 21st century health care will require intensive use of information technology to acquire and analyze data and then manage and disseminate information extracted from the data. No area is more data intensive than the intensive care unit. While there have been major improvements in intensive care monitoring, the medical industry, for the most part, has not incorporated many of the advances in computer science, biomedical engineering, signal processing, and mathematics that many other industries have embraced. Acquiring, synchronizing, integrating, and analyzing patient data remain frustratingly difficult because of incompatibilities among monitoring equipment, proprietary limitations from industry, and the absence of standard data formatting. In this paper, we will review the history of computers in the intensive care unit along with commonly used monitoring and data acquisition systems, both those commercially available and those being developed for research purposes. PMID:25734185
De Georgia, Michael A; Kaffashi, Farhad; Jacono, Frank J; Loparo, Kenneth A
2015-01-01
There is a broad consensus that 21st century health care will require intensive use of information technology to acquire and analyze data and then manage and disseminate information extracted from the data. No area is more data intensive than the intensive care unit. While there have been major improvements in intensive care monitoring, the medical industry, for the most part, has not incorporated many of the advances in computer science, biomedical engineering, signal processing, and mathematics that many other industries have embraced. Acquiring, synchronizing, integrating, and analyzing patient data remain frustratingly difficult because of incompatibilities among monitoring equipment, proprietary limitations from industry, and the absence of standard data formatting. In this paper, we will review the history of computers in the intensive care unit along with commonly used monitoring and data acquisition systems, both those commercially available and those being developed for research purposes.
PNNLs Data Intensive Computing research battles Homeland Security threats
David Thurman; Joe Kielman; Katherine Wolf; David Atkinson
2018-05-11
The Pacific Northwest National Laboratorys (PNNL's) approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architecture, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.
PNNL pushing scientific discovery through data intensive computing breakthroughs
Deborah Gracio; David Koppenaal; Ruby Leung
2018-05-18
The Pacific Northwest National Laboratory's approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architectures, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.
D0 Superconducting Solenoid Quench Data and Slow Dump Data Acquisition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Markley, D.; /Fermilab
1998-06-09
This Dzero Engineering note describes the method for which the 2 Tesla Superconducting Solenoid Fast Dump and Slow Dump data are accumulated, tracked and stored. The 2 Tesla Solenoid has eleven data points that need to be tracked and then stored when a fast dump or a slow dump occur. The TI555(Texas Instruments) PLC(Programmable Logic Controller) which controls the DC power circuit that powers the Solenoid, also has access to all the voltage taps and other equipment in the circuit. The TI555 constantly logs these eleven points in a rotating memory buffer. When either a fast dump(dump switch opens) ormore » a slow dump (power supply turns off) occurs, the TI555 organizes the respective data and will down load the data to a file on DO-CCRS2. This data in this file is moved over ethernet and is stored in a CSV (comma separated format) file which can easily be examined by Microsoft Excel or any other spreadsheet. The 2 Tesla solenoid control system also locks in first fault information. The TI555 decodes the first fault and passes it along to the program collecting the data and storing it on DO-CCRS2. This first fault information is then part of the file.« less
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning
Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.
Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Procura de Sinais de Dimensões Extras Universais em Colisões Próton-Antipróton (in Portuguese)
DOE Office of Scientific and Technical Information (OSTI.GOV)
de Souza Santos, Angelo
Models that predict the existence of extra spatial dimensions have been studied since the beginning of the 20th century. These models can incorporate gravity in the framework that describes the other interactions and they can present a number of interesting features such as a dark matter candidate. In this work, we explore the consequences of the Universal Extra Dimensions (UED) model by searching for the production of Kaluza-Klein particles whose decay chain leads to signaturemore » $$\\mu^{\\pm}\\mu^{\\pm} + \\mathrm{jets} + \\met$$. We employ the data set corresponding to an integrated luminosity of \\unit{7.3}{\\femto\\barn}$$^{-1}$$, collected by the \\dzero{} detector at a $$p\\bar p$$ collider at a center of mass energy of \\unit{1.96}{\\tera\\electronvolt}. Since no excess was observed in the data, we were able to set a lower limit on the compactification scale of $$R^{-1}>260$$ GeV in the model. This is the first study to impose a direct limit on the minimal UED model.« less
GLIDE: a grid-based light-weight infrastructure for data-intensive environments
NASA Technical Reports Server (NTRS)
Mattmann, Chris A.; Malek, Sam; Beckman, Nels; Mikic-Rakic, Marija; Medvidovic, Nenad; Chrichton, Daniel J.
2005-01-01
The promise of the grid is that it will enable public access and sharing of immense amounts of computational and data resources among dynamic coalitions of individuals and institutions. However, the current grid solutions make several limiting assumptions that curtail their widespread adoption. To address these limitations, we present GLIDE, a prototype light-weight, data-intensive middleware infrastructure that enables access to the robust data and computational power of the grid on DREAM platforms.
Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I
2015-11-03
We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Federated data storage system prototype for LHC experiments and data intensive science
NASA Astrophysics Data System (ADS)
Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Ryabinkin, E.; Zarochentsev, A.
2017-10-01
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Data Intensive Computing on Amazon Web Services
DOE Office of Scientific and Technical Information (OSTI.GOV)
Magana-Zook, S. A.
The Geophysical Monitoring Program (GMP) has spent the past few years building up the capability to perform data intensive computing using what have been referred to as “big data” tools. These big data tools would be used against massive archives of seismic signals (>300 TB) to conduct research not previously possible. Examples of such tools include Hadoop (HDFS, MapReduce), HBase, Hive, Storm, Spark, Solr, and many more by the day. These tools are useful for performing data analytics on datasets that exceed the resources of traditional analytic approaches. To this end, a research big data cluster (“Cluster A”) was setmore » up as a collaboration between GMP and Livermore Computing (LC).« less
Template Interfaces for Agile Parallel Data-Intensive Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramakrishnan, Lavanya; Gunter, Daniel; Pastorello, Gilerto Z.
Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres ismore » addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable "templates" that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.« less
Extracting the Data From the LCM vk4 Formatted Output File
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wendelberger, James G.
These are slides about extracting the data from the LCM vk4 formatted output file. The following is covered: vk4 file produced by Keyence VK Software, custom analysis, no off the shelf way to read the file, reading the binary data in a vk4 file, various offsets in decimal lines, finding the height image data, directly in MATLAB, binary output beginning of height image data, color image information, color image binary data, color image decimal and binary data, MATLAB code to read vk4 file (choose a file, read the file, compute offsets, read optical image, laser optical image, read and computemore » laser intensity image, read height image, timing, display height image, display laser intensity image, display RGB laser optical images, display RGB optical images, display beginning data and save images to workspace, gamma correction subroutine), reading intensity form the vk4 file, linear in the low range, linear in the high range, gamma correction for vk4 files, computing the gamma intensity correction, observations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tong, Dudu; Yang, Sichun; Lu, Lanyuan
2016-06-20
Structure modellingviasmall-angle X-ray scattering (SAXS) data generally requires intensive computations of scattering intensity from any given biomolecular structure, where the accurate evaluation of SAXS profiles using coarse-grained (CG) methods is vital to improve computational efficiency. To date, most CG SAXS computing methods have been based on a single-bead-per-residue approximation but have neglected structural correlations between amino acids. To improve the accuracy of scattering calculations, accurate CG form factors of amino acids are now derived using a rigorous optimization strategy, termed electron-density matching (EDM), to best fit electron-density distributions of protein structures. This EDM method is compared with and tested againstmore » other CG SAXS computing methods, and the resulting CG SAXS profiles from EDM agree better with all-atom theoretical SAXS data. By including the protein hydration shell represented by explicit CG water molecules and the correction of protein excluded volume, the developed CG form factors also reproduce the selected experimental SAXS profiles with very small deviations. Taken together, these EDM-derived CG form factors present an accurate and efficient computational approach for SAXS computing, especially when higher molecular details (represented by theqrange of the SAXS data) become necessary for effective structure modelling.« less
2008-02-27
between the PHY layer and for example a host PC computer . The PC wants to generate and receive a sequence of data packets. The PC may also want to send...the testbed is quite similar. Given the intense computational requirements of SVD and other matrix mode operations needed to support eigen spreading a...platform for real time operation. This task is probably the major challenge in the development of the testbed. All compute intensive tasks will be
Yang, Shuai; Zhang, Xinlei; Diao, Lihong; Guo, Feifei; Wang, Dan; Liu, Zhongyang; Li, Honglei; Zheng, Junjie; Pan, Jingshan; Nice, Edouard C; Li, Dong; He, Fuchu
2015-09-04
The Chromosome-centric Human Proteome Project (C-HPP) aims to catalog genome-encoded proteins using a chromosome-by-chromosome strategy. As the C-HPP proceeds, the increasing requirement for data-intensive analysis of the MS/MS data poses a challenge to the proteomic community, especially small laboratories lacking computational infrastructure. To address this challenge, we have updated the previous CAPER browser into a higher version, CAPER 3.0, which is a scalable cloud-based system for data-intensive analysis of C-HPP data sets. CAPER 3.0 uses cloud computing technology to facilitate MS/MS-based peptide identification. In particular, it can use both public and private cloud, facilitating the analysis of C-HPP data sets. CAPER 3.0 provides a graphical user interface (GUI) to help users transfer data, configure jobs, track progress, and visualize the results comprehensively. These features enable users without programming expertise to easily conduct data-intensive analysis using CAPER 3.0. Here, we illustrate the usage of CAPER 3.0 with four specific mass spectral data-intensive problems: detecting novel peptides, identifying single amino acid variants (SAVs) derived from known missense mutations, identifying sample-specific SAVs, and identifying exon-skipping events. CAPER 3.0 is available at http://prodigy.bprc.ac.cn/caper3.
Calibration of Clinical Audio Recording and Analysis Systems for Sound Intensity Measurement.
Maryn, Youri; Zarowski, Andrzej
2015-11-01
Sound intensity is an important acoustic feature of voice/speech signals. Yet recordings are performed with different microphone, amplifier, and computer configurations, and it is therefore crucial to calibrate sound intensity measures of clinical audio recording and analysis systems on the basis of output of a sound-level meter. This study was designed to evaluate feasibility, validity, and accuracy of calibration methods, including audiometric speech noise signals and human voice signals under typical speech conditions. Calibration consisted of 3 comparisons between data from 29 measurement microphone-and-computer systems and data from the sound-level meter: signal-specific comparison with audiometric speech noise at 5 levels, signal-specific comparison with natural voice at 3 levels, and cross-signal comparison with natural voice at 3 levels. Intensity measures from recording systems were then linearly converted into calibrated data on the basis of these comparisons, and validity and accuracy of calibrated sound intensity were investigated. Very strong correlations and quasisimilarity were found between calibrated data and sound-level meter data across calibration methods and recording systems. Calibration of clinical sound intensity measures according to this method is feasible, valid, accurate, and representative for a heterogeneous set of microphones and data acquisition systems in real-life circumstances with distinct noise contexts.
Measurement of the zz -> l+l-l+l- cross-section at root(s) = 1.96 TeV with the DO detector
NASA Astrophysics Data System (ADS)
Feng, Lei
The thesis describes works carried out on the Dzero experiment, a particle detector located at the Fermilab Tevatron proton-antiproton collider operating at √(s) = 1.96 TeV. After thorough study of the acceptance and efficiencies for each channel, 15.46 +/- 0.05 (stat.) +/- 1.83 (syst.) events are expected in all three channels with a background of 1.47 +/- 0.05 (stat.) +0.15-0.26 (syst.) events. A correction factor obtained from simulation allows us to convert this into a high mass cross section measurement for pure on-shell ZZ production. The pure ZZ cross section is measured to be sigma.
NASA Astrophysics Data System (ADS)
Vilotte, J.-P.; Atkinson, M.; Michelini, A.; Igel, H.; van Eck, T.
2012-04-01
Increasingly dense seismic and geodetic networks are continuously transmitting a growing wealth of data from around the world. The multi-use of these data leaded the seismological community to pioneer globally distributed open-access data infrastructures, standard services and formats, e.g., the Federation of Digital Seismic Networks (FDSN) and the European Integrated Data Archives (EIDA). Our ability to acquire observational data outpaces our ability to manage, analyze and model them. Research in seismology is today facing a fundamental paradigm shift. Enabling advanced data-intensive analysis and modeling applications challenges conventional storage, computation and communication models and requires a new holistic approach. It is instrumental to exploit the cornucopia of data, and to guarantee optimal operation and design of the high-cost monitoring facilities. The strategy of VERCE is driven by the needs of the seismological data-intensive applications in data analysis and modeling. It aims to provide a comprehensive architecture and framework adapted to the scale and the diversity of those applications, and integrating the data infrastructures with Grid, Cloud and HPC infrastructures. It will allow prototyping solutions for new use cases as they emerge within the European Plate Observatory Systems (EPOS), the ESFRI initiative of the solid Earth community. Computational seismology, and information management, is increasingly revolving around massive amounts of data that stem from: (1) the flood of data from the observational systems; (2) the flood of data from large-scale simulations and inversions; (3) the ability to economically store petabytes of data online; (4) the evolving Internet and Data-aware computing capabilities. As data-intensive applications are rapidly increasing in scale and complexity, they require additional services-oriented architectures offering a virtualization-based flexibility for complex and re-usable workflows. Scientific information management poses computer science challenges: acquisition, organization, query and visualization tasks scale almost linearly with the data volumes. Commonly used FTP-GREP metaphor allows today to scan gigabyte-sized datasets but will not work for scanning terabyte-sized continuous waveform datasets. New data analysis and modeling methods, exploiting the signal coherence within dense network arrays, are nonlinear. Pair-algorithms on N points scale as N2. Wave form inversion and stochastic simulations raise computing and data handling challenges These applications are unfeasible for tera-scale datasets without new parallel algorithms that use near-linear processing, storage and bandwidth, and that can exploit new computing paradigms enabled by the intersection of several technologies (HPC, parallel scalable database crawler, data-aware HPC). This issues will be discussed based on a number of core pilot data-intensive applications and use cases retained in VERCE. This core applications are related to: (1) data processing and data analysis methods based on correlation techniques; (2) cpu-intensive applications such as large-scale simulation of synthetic waveforms in complex earth systems, and full waveform inversion and tomography. We shall analyze their workflow and data flow, and their requirements for a new service-oriented architecture and a data-aware platform with services and tools. Finally, we will outline the importance of a new collaborative environment between seismology and computer science, together with the need for the emergence and the recognition of 'research technologists' mastering the evolving data-aware technologies and the data-intensive research goals in seismology.
Efficient Memory Access with NumPy Global Arrays using Local Memory Access
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Berghofer, Dan C.
This paper discusses the work completed working with Global Arrays of data on distributed multi-computer systems and improving their performance. The tasks completed were done at Pacific Northwest National Laboratory in the Science Undergrad Laboratory Internship program in the summer of 2013 for the Data Intensive Computing Group in the Fundamental and Computational Sciences DIrectorate. This work was done on the Global Arrays Toolkit developed by this group. This toolkit is an interface for programmers to more easily create arrays of data on networks of computers. This is useful because scientific computation is often done on large amounts of datamore » sometimes so large that individual computers cannot hold all of it. This data is held in array form and can best be processed on supercomputers which often consist of a network of individual computers doing their computation in parallel. One major challenge for this sort of programming is that operations on arrays on multiple computers is very complex and an interface is needed so that these arrays seem like they are on a single computer. This is what global arrays does. The work done here is to use more efficient operations on that data that requires less copying of data to be completed. This saves a lot of time because copying data on many different computers is time intensive. The way this challenge was solved is when data to be operated on with binary operations are on the same computer, they are not copied when they are accessed. When they are on separate computers, only one set is copied when accessed. This saves time because of less copying done although more data access operations were done.« less
CT to Cone-beam CT Deformable Registration With Simultaneous Intensity Correction
Zhen, Xin; Gu, Xuejun; Yan, Hao; Zhou, Linghong; Jia, Xun; Jiang, Steve B.
2012-01-01
Computed tomography (CT) to cone-beam computed tomography (CBCT) deformable image registration (DIR) is a crucial step in adaptive radiation therapy. Current intensity-based registration algorithms, such as demons, may fail in the context of CT-CBCT DIR because of inconsistent intensities between the two modalities. In this paper, we propose a variant of demons, called Deformation with Intensity Simultaneously Corrected (DISC), to deal with CT-CBCT DIR. DISC distinguishes itself from the original demons algorithm by performing an adaptive intensity correction step on the CBCT image at every iteration step of the demons registration. Specifically, the intensity correction of a voxel in CBCT is achieved by matching the first and the second moments of the voxel intensities inside a patch around the voxel with those on the CT image. It is expected that such a strategy can remove artifacts in the CBCT image, as well as ensuring the intensity consistency between the two modalities. DISC is implemented on computer graphics processing units (GPUs) in compute unified device architecture (CUDA) programming environment. The performance of DISC is evaluated on a simulated patient case and six clinical head-and-neck cancer patient data. It is found that DISC is robust against the CBCT artifacts and intensity inconsistency and significantly improves the registration accuracy when compared with the original demons. PMID:23032638
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pike, Bill
Data—lots of data—generated in seconds and piling up on the internet, streaming and stored in countless databases. Big data is important for commerce, society and our nation’s security. Yet the volume, velocity, variety and veracity of data is simply too great for any single analyst to make sense of alone. It requires advanced, data-intensive computing. Simply put, data-intensive computing is the use of sophisticated computers to sort through mounds of information and present analysts with solutions in the form of graphics, scenarios, formulas, new hypotheses and more. This scientific capability is foundational to PNNL’s energy, environment and security missions. Seniormore » Scientist and Division Director Bill Pike and his team are developing analytic tools that are used to solve important national challenges, including cyber systems defense, power grid control systems, intelligence analysis, climate change and scientific exploration.« less
Challenges in reusing transactional data for daily documentation in neonatal intensive care.
Kim, G R; Lawson, E E; Lehmann, C U
2008-11-06
The reuse of transactional data for clinical documentation requires navigation of computational, institutional and adaptive barriers. We describe organizational and technical issues in developing and deploying a daily progress note tool in a tertiary neonatal intensive care unit that reuses and aggregates data from a commercial integrated clinical information system.
Pazhur, R J; Kutter, B; Georgieff, M; Schraag, S
2003-06-01
Portable digital assistants (PDAs) may be of value to the anaesthesiologist as development in medical care is moving towards "bedside computing". Many different portable computers are currently available and it is now possible for the physician to carry a mobile computer with him all the time. It is data base, reference book, patient tracking help, date planner, computer, book, magazine, calculator and much more in one mobile device. With the help of a PDA, information that is required for our work may be available at all times and everywhere at the point of care within seconds. In this overview the possibilities for the use of PDAs in anaesthesia and intensive care medicine are discussed. Developments in other countries, possibilities in use but also problems such as data security and network technology are evaluated.
NASA Astrophysics Data System (ADS)
Zavaletta, Vanessa A.; Bartholmai, Brian J.; Robb, Richard A.
2007-03-01
Diffuse lung diseases, such as idiopathic pulmonary fibrosis (IPF), can be characterized and quantified by analysis of volumetric high resolution CT scans of the lungs. These data sets typically have dimensions of 512 x 512 x 400. It is too subjective and labor intensive for a radiologist to analyze each slice and quantify regional abnormalities manually. Thus, computer aided techniques are necessary, particularly texture analysis techniques which classify various lung tissue types. Second and higher order statistics which relate the spatial variation of the intensity values are good discriminatory features for various textures. The intensity values in lung CT scans range between [-1024, 1024]. Calculation of second order statistics on this range is too computationally intensive so the data is typically binned between 16 or 32 gray levels. There are more effective ways of binning the gray level range to improve classification. An optimal and very efficient way to nonlinearly bin the histogram is to use a dynamic programming algorithm. The objective of this paper is to show that nonlinear binning using dynamic programming is computationally efficient and improves the discriminatory power of the second and higher order statistics for more accurate quantification of diffuse lung disease.
NASA Astrophysics Data System (ADS)
Perrin, A.; Ndao, M.; Manceron, L.
2017-10-01
A recent paper [1] presents a high-resolution, high-temperature version of the Nitrogen Dioxide Spectroscopic Databank called NDSD-1000. The NDSD-1000 database contains line parameters (positions, intensities, self- and air-broadening coefficients, exponents of the temperature dependence of self- and air-broadening coefficients) for numerous cold and hot bands of the 14N16O2 isotopomer of nitrogen dioxide. The parameters used for the line positions and intensities calculation were generated through a global modeling of experimental data collected in the literature within the framework of the method of effective operators. However, the form of the effective dipole moment operator used to compute the NO2 line intensities in the NDSD-1000 database differs from the classical one used for line intensities calculation in the NO2 infrared literature [12]. Using Fourier transform spectra recorded at high resolution in the 6.3 μm region, it is shown here, that the NDSD-1000 formulation is incorrect since the computed intensities do not account properly for the (Int(+)/Int(-)) intensity ratio between the (+) (J = N+ 1/2) and (-) (J = N-1/2) electron - spin rotation subcomponents of the computed vibration rotation transitions. On the other hand, in the HITRAN or GEISA spectroscopic databases, the NO2 line intensities were computed using the classical theoretical approach, and it is shown here that these data lead to a significant better agreement between the observed and calculated spectra.
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
NASA Astrophysics Data System (ADS)
Cheng, Tian-Le; Ma, Fengde D.; Zhou, Jie E.; Jennings, Guy; Ren, Yang; Jin, Yongmei M.; Wang, Yu U.
2012-01-01
Diffuse scattering contains rich information on various structural disorders, thus providing a useful means to study the nanoscale structural deviations from the average crystal structures determined by Bragg peak analysis. Extraction of maximal information from diffuse scattering requires concerted efforts in high-quality three-dimensional (3D) data measurement, quantitative data analysis and visualization, theoretical interpretation, and computer simulations. Such an endeavor is undertaken to study the correlated dynamic atomic position fluctuations caused by thermal vibrations (phonons) in precursor state of shape-memory alloys. High-quality 3D diffuse scattering intensity data around representative Bragg peaks are collected by using in situ high-energy synchrotron x-ray diffraction and two-dimensional digital x-ray detector (image plate). Computational algorithms and codes are developed to construct the 3D reciprocal-space map of diffuse scattering intensity distribution from the measured data, which are further visualized and quantitatively analyzed to reveal in situ physical behaviors. Diffuse scattering intensity distribution is explicitly formulated in terms of atomic position fluctuations to interpret the experimental observations and identify the most relevant physical mechanisms, which help set up reduced structural models with minimal parameters to be efficiently determined by computer simulations. Such combined procedures are demonstrated by a study of phonon softening phenomenon in precursor state and premartensitic transformation of Ni-Mn-Ga shape-memory alloy.
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Web-based interactive visualization in a Grid-enabled neuroimaging application using HTML5.
Siewert, René; Specovius, Svenja; Wu, Jie; Krefting, Dagmar
2012-01-01
Interactive visualization and correction of intermediate results are required in many medical image analysis pipelines. To allow certain interaction in the remote execution of compute- and data-intensive applications, new features of HTML5 are used. They allow for transparent integration of user interaction into Grid- or Cloud-enabled scientific workflows. Both 2D and 3D visualization and data manipulation can be performed through a scientific gateway without the need to install specific software or web browser plugins. The possibilities of web-based visualization are presented along the FreeSurfer-pipeline, a popular compute- and data-intensive software tool for quantitative neuroimaging.
On the Modeling and Management of Cloud Data Analytics
NASA Astrophysics Data System (ADS)
Castillo, Claris; Tantawi, Asser; Steinder, Malgorzata; Pacifici, Giovanni
A new era is dawning where vast amount of data is subjected to intensive analysis in a cloud computing environment. Over the years, data about a myriad of things, ranging from user clicks to galaxies, have been accumulated, and continue to be collected, on storage media. The increasing availability of such data, along with the abundant supply of compute power and the urge to create useful knowledge, gave rise to a new data analytics paradigm in which data is subjected to intensive analysis, and additional data is created in the process. Meanwhile, a new cloud computing environment has emerged where seemingly limitless compute and storage resources are being provided to host computation and data for multiple users through virtualization technologies. Such a cloud environment is becoming the home for data analytics. Consequently, providing good performance at run-time to data analytics workload is an important issue for cloud management. In this paper, we provide an overview of the data analytics and cloud environment landscapes, and investigate the performance management issues related to running data analytics in the cloud. In particular, we focus on topics such as workload characterization, profiling analytics applications and their pattern of data usage, cloud resource allocation, placement of computation and data and their dynamic migration in the cloud, and performance prediction. In solving such management problems one relies on various run-time analytic models. We discuss approaches for modeling and optimizing the dynamic data analytics workload in the cloud environment. All along, we use the Map-Reduce paradigm as an illustration of data analytics.
Benchmarking Memory Performance with the Data Cube Operator
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabanov, Leonid V.
2004-01-01
Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Unsteady thermal blooming of intense laser beams
NASA Astrophysics Data System (ADS)
Ulrich, J. T.; Ulrich, P. B.
1980-01-01
A four dimensional (three space plus time) computer program has been written to compute the nonlinear heating of a gas by an intense laser beam. Unsteady, transient cases are capable of solution and no assumption of a steady state need be made. The transient results are shown to asymptotically approach the steady-state results calculated by the standard three dimensional thermal blooming computer codes. The report discusses the physics of the laser-absorber interaction, the numerical approximation used, and comparisons with experimental data. A flowchart is supplied in the appendix to the report.
MSFC crack growth analysis computer program, version 2 (users manual)
NASA Technical Reports Server (NTRS)
Creager, M.
1976-01-01
An updated version of the George C. Marshall Space Flight Center Crack Growth Analysis Program is described. The updated computer program has significantly expanded capabilities over the original one. This increased capability includes an extensive expansion of the library of stress intensity factors, plotting capability, increased design iteration capability, and the capability of performing proof test logic analysis. The technical approaches used within the computer program are presented, and the input and output formats and options are described. Details of the stress intensity equations, example data, and example problems are presented.
NASA Astrophysics Data System (ADS)
Kropivnitskaya, Y. Y.; Tiampo, K. F.; Qin, J.; Bauer, M.
2015-12-01
Intensity is one of the most useful measures of earthquake hazard, as it quantifies the strength of shaking produced at a given distance from the epicenter. Today, there are several data sources that could be used to determine intensity level which can be divided into two main categories. The first category is represented by social data sources, in which the intensity values are collected by interviewing people who experienced the earthquake-induced shaking. In this case, specially developed questionnaires can be used in addition to personal observations published on social networks such as Twitter. These observations are assigned to the appropriate intensity level by correlating specific details and descriptions to the Modified Mercalli Scale. The second category of data sources is represented by observations from different physical sensors installed with the specific purpose of obtaining an instrumentally-derived intensity level. These are usually based on a regression of recorded peak acceleration and/or velocity amplitudes. This approach relates the recorded ground motions to the expected felt and damage distribution through empirical relationships. The goal of this work is to implement and evaluate streaming data processing separately and jointly from both social and physical sensors in order to produce near real-time intensity maps and compare and analyze their quality and evolution through 10-minute time intervals immediately following an earthquake. Results are shown for the case study of the M6.0 2014 South Napa, CA earthquake that occurred on August 24, 2014. The using of innovative streaming and pipelining computing paradigms through IBM InfoSphere Streams platform made it possible to read input data in real-time for low-latency computing of combined intensity level and production of combined intensity maps in near-real time. The results compare three types of intensity maps created based on physical, social and combined data sources. Here we correlate the count and density of Tweets with intensity level and show the importance of processing combined data sources at the earliest time stages after earthquake happens. This method can supplement existing approaches of intensity level detection, especially in the regions with high number of Twitter users and low density of seismic networks.
Computer Series, 98. Electronics for Scientists: A Computer-Intensive Approach.
ERIC Educational Resources Information Center
Scheeline, Alexander; Mork, Brian J.
1988-01-01
Reports the design for a principles-before-details presentation of electronics for an instrumental analysis class. Uses computers for data collection and simulations. Requires one semester with two 2.5-hour periods and two lectures per week. Includes lab and lecture syllabi. (MVL)
In Vivo Validation of Numerical Prediction for Turbulence Intensity in an Aortic Coarctation
Arzani, Amirhossein; Dyverfeldt, Petter; Ebbers, Tino; Shadden, Shawn C.
2013-01-01
This paper compares numerical predictions of turbulence intensity with in vivo measurement. Magnetic resonance imaging (MRI) was carried out on a 60-year-old female with a restenosed aortic coarctation. Time-resolved three-directional phase-contrast (PC) MRI data was acquired to enable turbulence intensity estimation. A contrast-enhanced MR angiography (MRA) and a time-resolved 2D PCMRI measurement were also performed to acquire data needed to perform subsequent image-based computational fluid dynamics (CFD) modeling. A 3D model of the aortic coarctation and surrounding vasculature was constructed from the MRA data, and physiologic boundary conditions were modeled to match 2D PCMRI and pressure pulse measurements. Blood flow velocity data was subsequently obtained by numerical simulation. Turbulent kinetic energy (TKE) was computed from the resulting CFD data. Results indicate relative agreement (error ≈10%) between the in vivo measurements and the CFD predictions of TKE. The discrepancies in modeled vs. measured TKE values were within expectations due to modeling and measurement errors. PMID:22016327
Computing moment to moment BOLD activation for real-time neurofeedback
Hinds, Oliver; Ghosh, Satrajit; Thompson, Todd W.; Yoo, Julie J.; Whitfield-Gabrieli, Susan; Triantafyllou, Christina; Gabrieli, John D.E.
2013-01-01
Estimating moment to moment changes in blood oxygenation level dependent (BOLD) activation levels from functional magnetic resonance imaging (fMRI) data has applications for learned regulation of regional activation, brain state monitoring, and brain-machine interfaces. In each of these contexts, accurate estimation of the BOLD signal in as little time as possible is desired. This is a challenging problem due to the low signal-to-noise ratio of fMRI data. Previous methods for real-time fMRI analysis have either sacrificed the ability to compute moment to moment activation changes by averaging several acquisitions into a single activation estimate or have sacrificed accuracy by failing to account for prominent sources of noise in the fMRI signal. Here we present a new method for computing the amount of activation present in a single fMRI acquisition that separates moment to moment changes in the fMRI signal intensity attributable to neural sources from those due to noise, resulting in a feedback signal more reflective of neural activation. This method computes an incremental general linear model fit to the fMRI timeseries, which is used to calculate the expected signal intensity at each new acquisition. The difference between the measured intensity and the expected intensity is scaled by the variance of the estimator in order to transform this residual difference into a statistic. Both synthetic and real data were used to validate this method and compare it to the only other published real-time fMRI method. PMID:20682350
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
2013-01-28
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications
NASA Astrophysics Data System (ADS)
Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi
2012-05-01
The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
Integrating the Apache Big Data Stack with HPC for Big Data
NASA Astrophysics Data System (ADS)
Fox, G. C.; Qiu, J.; Jha, S.
2014-12-01
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
Computational Process Modeling for Additive Manufacturing
NASA Technical Reports Server (NTRS)
Bagg, Stacey; Zhang, Wei
2014-01-01
Computational Process and Material Modeling of Powder Bed additive manufacturing of IN 718. Optimize material build parameters with reduced time and cost through modeling. Increase understanding of build properties. Increase reliability of builds. Decrease time to adoption of process for critical hardware. Potential to decrease post-build heat treatments. Conduct single-track and coupon builds at various build parameters. Record build parameter information and QM Meltpool data. Refine Applied Optimization powder bed AM process model using data. Report thermal modeling results. Conduct metallography of build samples. Calibrate STK models using metallography findings. Run STK models using AO thermal profiles and report STK modeling results. Validate modeling with additional build. Photodiode Intensity measurements highly linear with power input. Melt Pool Intensity highly correlated to Melt Pool Size. Melt Pool size and intensity increase with power. Applied Optimization will use data to develop powder bed additive manufacturing process model.
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
2012-01-01
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Milles, J; van der Geest, R J; Jerosch-Herold, M; Reiber, J H C; Lelieveldt, B P F
2007-01-01
This paper presents a novel method for registration of cardiac perfusion MRI. The presented method successfully corrects for breathing motion without any manual interaction using Independent Component Analysis to extract physiologically relevant features together with their time-intensity behavior. A time-varying reference image mimicking intensity changes in the data of interest is computed based on the results of ICA, and used to compute the displacement caused by breathing for each frame. Qualitative and quantitative validation of the method is carried out using 46 clinical quality, short-axis, perfusion MR datasets comprising 100 images each. Validation experiments showed a reduction of the average LV motion from 1.26+/-0.87 to 0.64+/-0.46 pixels. Time-intensity curves are also improved after registration with an average error reduced from 2.65+/-7.89% to 0.87+/-3.88% between registered data and manual gold standard. We conclude that this fully automatic ICA-based method shows an excellent accuracy, robustness and computation speed, adequate for use in a clinical environment.
A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Good, J. C. Deelman, E.; Singh, G. Livny, M.
2009-09-01
Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance.
Resampling: A Marriage of Computers and Statistics. ERIC/TM Digest.
ERIC Educational Resources Information Center
Rudner, Lawrence M.; Shafer, Mary Morello
Advances in computer technology are making it possible for educational researchers to use simpler statistical methods to address a wide range of questions with smaller data sets and fewer, and less restrictive, assumptions. This digest introduces computationally intensive statistics, collectively called resampling techniques. Resampling is a…
USDA-ARS?s Scientific Manuscript database
With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...
Active Subspace Methods for Data-Intensive Inverse Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Qiqi
2017-04-27
The project has developed theory and computational tools to exploit active subspaces to reduce the dimension in statistical calibration problems. This dimension reduction enables MCMC methods to calibrate otherwise intractable models. The same theoretical and computational tools can also reduce the measurement dimension for calibration problems that use large stores of data.
Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis
NASA Technical Reports Server (NTRS)
Duffy, Daniel Q.; Schnase, John L.; Thompson, John H.; Freeman, Shawn M.; Clune, Thomas L.
2012-01-01
MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows.
Measuring and Estimating Normalized Contrast in Infrared Flash Thermography
NASA Technical Reports Server (NTRS)
Koshti, Ajay M.
2013-01-01
Infrared flash thermography (IRFT) is used to detect void-like flaws in a test object. The IRFT technique involves heating up the part surface using a flash of flash lamps. The post-flash evolution of the part surface temperature is sensed by an IR camera in terms of pixel intensity of image pixels. The IR technique involves recording of the IR video image data and analysis of the data using the normalized pixel intensity and temperature contrast analysis method for characterization of void-like flaws for depth and width. This work introduces a new definition of the normalized IR pixel intensity contrast and normalized surface temperature contrast. A procedure is provided to compute the pixel intensity contrast from the camera pixel intensity evolution data. The pixel intensity contrast and the corresponding surface temperature contrast differ but are related. This work provides a method to estimate the temperature evolution and the normalized temperature contrast from the measured pixel intensity evolution data and some additional measurements during data acquisition.
Large-Scale Compute-Intensive Analysis via a Combined In-situ and Co-scheduling Workflow Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Sewell, Christopher; Heitmann, Katrin
2015-01-01
Large-scale simulations can produce tens of terabytes of data per analysis cycle, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in situ and co-scheduling approaches for handling Petabyte-size outputs. An initial inmore » situ step is used to reduce the amount of data to be analyzed, and to separate out the data-intensive tasks handled off-line. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.« less
A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure
Fontaine, Michael D.
2013-01-01
Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690
Rasdaman for Big Spatial Raster Data
NASA Astrophysics Data System (ADS)
Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.
2015-12-01
Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.
Xia, Yingjie; Hu, Jia; Fontaine, Michael D
2013-01-01
Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.
Raw data normalization for a multi source inverse geometry CT system
Baek, Jongduk; De Man, Bruno; Harrison, Daniel; Pelc, Norbert J.
2015-01-01
A multi-source inverse-geometry CT (MS-IGCT) system consists of a small 2D detector array and multiple x-ray sources. During data acquisition, each source is activated sequentially, and may have random source intensity fluctuations relative to their respective nominal intensity. While a conventional 3rd generation CT system uses a reference channel to monitor the source intensity fluctuation, the MS-IGCT system source illuminates a small portion of the entire field-of-view (FOV). Therefore, it is difficult for all sources to illuminate the reference channel and the projection data computed by standard normalization using flat field data of each source contains error and can cause significant artifacts. In this work, we present a raw data normalization algorithm to reduce the image artifacts caused by source intensity fluctuation. The proposed method was tested using computer simulations with a uniform water phantom and a Shepp-Logan phantom, and experimental data of an ice-filled PMMA phantom and a rabbit. The effect on image resolution and robustness of the noise were tested using MTF and standard deviation of the reconstructed noise image. With the intensity fluctuation and no correction, reconstructed images from simulation and experimental data show high frequency artifacts and ring artifacts which are removed effectively using the proposed method. It is also observed that the proposed method does not degrade the image resolution and is very robust to the presence of noise. PMID:25837090
Enabling NVM for Data-Intensive Scientific Services
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carns, Philip; Jenkins, John; Seo, Sangmin
Specialized, transient data services are playing an increasingly prominent role in data-intensive scientific computing. These services offer flexible, on-demand pairing of applications with storage hardware using semantics that are optimized for the problem domain. Concurrent with this trend, upcoming scientific computing and big data systems will be deployed with emerging NVM technology to achieve the highest possible price/productivity ratio. Clearly, therefore, we must develop techniques to facilitate the confluence of specialized data services and NVM technology. In this work we explore how to enable the composition of NVM resources within transient distributed services while still retaining their essential performance characteristics.more » Our approach involves eschewing the conventional distributed file system model and instead projecting NVM devices as remote microservices that leverage user-level threads, RPC services, RMA-enabled network transports, and persistent memory libraries in order to maximize performance. We describe a prototype system that incorporates these concepts, evaluate its performance for key workloads on an exemplar system, and discuss how the system can be leveraged as a component of future data-intensive architectures.« less
Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.
Scalable Automated Model Search
2014-05-20
ma- chines. Categories and Subject Descriptors Big Data [Distributed Computing]: Large scale optimization 1. INTRODUCTION Modern scientific and...from Continuum Analytics[1], and Apache Spark 0.8.1. Additionally, we made use of Hadoop 1.0.4 configured on local disks as our data store for the large...Borkar et al. Hyracks: A flexible and extensible foundation for data -intensive computing. In ICDE, 2011. [16] J. Canny and H. Zhao. Big data
Evaluating virtual hosted desktops for graphics-intensive astronomy
NASA Astrophysics Data System (ADS)
Meade, B. F.; Fluke, C. J.
2018-04-01
Visualisation of data is critical to understanding astronomical phenomena. Today, many instruments produce datasets that are too big to be downloaded to a local computer, yet many of the visualisation tools used by astronomers are deployed only on desktop computers. Cloud computing is increasingly used to provide a computation and simulation platform in astronomy, but it also offers great potential as a visualisation platform. Virtual hosted desktops, with graphics processing unit (GPU) acceleration, allow interactive, graphics-intensive desktop applications to operate co-located with astronomy datasets stored in remote data centres. By combining benchmarking and user experience testing, with a cohort of 20 astronomers, we investigate the viability of replacing physical desktop computers with virtual hosted desktops. In our work, we compare two Apple MacBook computers (one old and one new, representing hardware and opposite ends of the useful lifetime) with two virtual hosted desktops: one commercial (Amazon Web Services) and one in a private research cloud (the Australian NeCTAR Research Cloud). For two-dimensional image-based tasks and graphics-intensive three-dimensional operations - typical of astronomy visualisation workflows - we found that benchmarks do not necessarily provide the best indication of performance. When compared to typical laptop computers, virtual hosted desktops can provide a better user experience, even with lower performing graphics cards. We also found that virtual hosted desktops are equally simple to use, provide greater flexibility in choice of configuration, and may actually be a more cost-effective option for typical usage profiles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lahrichi, Nadia
2004-06-01
In this thesis we have put the first constraints on t he fundamental parameters of the Randall-Sundnun model of extra dimensions,more » $$k / M_{pl}$$ which is proportional to the coupling of the graviton to the standard model fields and $$M_G$$ which is the mass of the first excited state of t he Kaluza-Klein graviton. The analysis perfomed on Monte carlo sample of the sign al allowed to find an error in the PYTHIA generator. The elaboration of an independent generator dedicated for this special analysis helped to find out and correct the error. The data sample used for the an alysis covers the period running fron1 november 2002 up to July 2002 taken by the Dzero collaboration at Tevatron, which corresponds to an accumulated lumninosity of 107,8 pb-1 . The search for the graviton in the dinmon channel allowed to rnea.sure the Z production cross-section t irnes the branching ratio in dimuons.« less
Impedance computations and beam-based measurements: A problem of discrepancy
NASA Astrophysics Data System (ADS)
Smaluk, Victor
2018-04-01
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictions based on the computed impedance budgets show a significant discrepancy. Three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.
Data communication network at the ASRM facility
NASA Astrophysics Data System (ADS)
Moorhead, Robert J., II; Smith, Wayne D.
1993-08-01
This report describes the simulation of the overall communication network structure for the Advanced Solid Rocket Motor (ASRM) facility being built at Yellow Creek near Iuka, Mississippi as of today. The report is compiled using information received from NASA/MSFC, LMSC, AAD, and RUST Inc. As per the information gathered, the overall network structure will have one logical FDDI ring acting as a backbone for the whole complex. The buildings will be grouped into two categories viz. manufacturing intensive and manufacturing non-intensive. The manufacturing intensive buildings will be connected via FDDI to the Operational Information System (OIS) in the main computing center in B_1000. The manufacturing non-intensive buildings will be connected by 10BASE-FL to the OIS through the Business Information System (BIS) hub in the main computing center. All the devices inside B_1000 will communicate with the BIS. The workcells will be connected to the Area Supervisory Computers (ASCs) through the nearest manufacturing intensive hub and one of the OIS hubs. Comdisco's Block Oriented Network Simulator (BONeS) has been used to simulate the performance of the network. BONeS models a network topology, traffic, data structures, and protocol functions using a graphical interface. The main aim of the simulations was to evaluate the loading of the OIS, the BIS, and the ASCs, and the network links by the traffic generated by the workstations and workcells throughout the site.
Data communication network at the ASRM facility
NASA Technical Reports Server (NTRS)
Moorhead, Robert J., II; Smith, Wayne D.
1993-01-01
This report describes the simulation of the overall communication network structure for the Advanced Solid Rocket Motor (ASRM) facility being built at Yellow Creek near Iuka, Mississippi as of today. The report is compiled using information received from NASA/MSFC, LMSC, AAD, and RUST Inc. As per the information gathered, the overall network structure will have one logical FDDI ring acting as a backbone for the whole complex. The buildings will be grouped into two categories viz. manufacturing intensive and manufacturing non-intensive. The manufacturing intensive buildings will be connected via FDDI to the Operational Information System (OIS) in the main computing center in B_1000. The manufacturing non-intensive buildings will be connected by 10BASE-FL to the OIS through the Business Information System (BIS) hub in the main computing center. All the devices inside B_1000 will communicate with the BIS. The workcells will be connected to the Area Supervisory Computers (ASCs) through the nearest manufacturing intensive hub and one of the OIS hubs. Comdisco's Block Oriented Network Simulator (BONeS) has been used to simulate the performance of the network. BONeS models a network topology, traffic, data structures, and protocol functions using a graphical interface. The main aim of the simulations was to evaluate the loading of the OIS, the BIS, and the ASCs, and the network links by the traffic generated by the workstations and workcells throughout the site.
Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures
NASA Astrophysics Data System (ADS)
Nguyen, P. T.; Chapman, D. R.; Halem, M.
2012-12-01
New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in the Nino 4 region, as well as a 1.9 Kelvin decadal Arctic warming in the 4u and 12u spectral regions. Additionally, we will present the frequency of extreme global warming events by the use of a normalized maximum BT in a grid cell relative to its local standard deviation. A low-latency Hadoop scheduling environment maintains data integrity and fault tolerance in a MapReduce data intensive Cloud environment while improving the "time to solution" metric by 35% when compared to a more traditional parallel processing system for the same dataset. Our next step will be to improve the usability of our Hadoop task scheduling system, to enable rapid prototyping of data intensive experiments by means of processing "kernels". We will report on the performance and experience of implementing these experiments on the NEX testbed, and propose the use of a graphical directed acyclic graph (DAG) interface to help us develop on-demand scientific experiments. Our workflow system works within Hadoop infrastructure as a replacement for the FIFO or FairScheduler, thus the use of Apache "Pig" latin or other Apache tools may also be worth investigating on the NEX system to improve the usability of our workflow scheduling infrastructure for rapid experimentation.
Framework Resources Multiply Computing Power
NASA Technical Reports Server (NTRS)
2010-01-01
As an early proponent of grid computing, Ames Research Center awarded Small Business Innovation Research (SBIR) funding to 3DGeo Development Inc., of Santa Clara, California, (now FusionGeo Inc., of The Woodlands, Texas) to demonstrate a virtual computer environment that linked geographically dispersed computer systems over the Internet to help solve large computational problems. By adding to an existing product, FusionGeo enabled access to resources for calculation- or data-intensive applications whenever and wherever they were needed. Commercially available as Accelerated Imaging and Modeling, the product is used by oil companies and seismic service companies, which require large processing and data storage capacities.
Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Modern Computational Techniques for the HMMER Sequence Analysis
2013-01-01
This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944
NASA Astrophysics Data System (ADS)
Hampton, S. E.
2015-12-01
The science necessary to unravel complex environmental problems confronts severe computational challenges - coping with huge volumes of heterogeneous data, spanning vast spatial scales at high resolution, and requiring integration of disparate measurements from multiple disciplines. But as cyberinfrastructure advances to support such work, scientists in many fields lack sufficient computational skills to participate in interdisciplinary, data-intensive research. In response, we developed innovative training workshops for early-career scientists, in order to explore both the needs and solutions for training next-generation scientists in skills for data-intensive environmental research. In 2013 and 2014 we ran intensive 3-week training workshops for early-career researchers. One of the workshops was run concurrently in California and North Carolina, connected by virtual technologies and coordinated schedules. We attracted applicants to the workshop with the opportunity to pursue data-intensive small-group research projects that they proposed. This approach presented a realistic possibility that publishable products could result from 3 weeks of focused hands-on classroom instruction combined with self-directed group research in which instructors were present to assist trainees. Instruction addressed 1) collaboration modes and technologies, 2) data management, preservation, and sharing, 3) preparing data for analysis using scripting, 4) reproducible research, 5) sustainable software practices, 6) data analysis and modeling, and 7) communicating results to broad communities. The most dramatic improvements in technical skills were in data management, version control, and working with spatial data outside of proprietary software. In addition, participants built strong networks and collaborative skills that later resulted in a successful student-led grant proposal, published manuscripts, and participants reported that the training was a highly influential experience.
Barriers and Incentives to Computer Usage in Teaching
1988-09-29
classes with one or two computers. Research Methods The two major methods of data-gathering employed in this study were intensive and extensive classroom ... observation and repeated extended interviews with students and teachers. Administrators were also interviewed when appropriate. Classroom observers used
NASA Astrophysics Data System (ADS)
Li, J.; Zhang, T.; Huang, Q.; Liu, Q.
2014-12-01
Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Ali, Syed Mashhood; Shamim, Shazia
2015-07-01
Complexation of racemic citalopram with β-cyclodextrin (β-CD) in aqueous medium was investigated to determine atom-accurate structure of the inclusion complexes. (1) H-NMR chemical shift change data of β-CD cavity protons in the presence of citalopram confirmed the formation of 1 : 1 inclusion complexes. ROESY spectrum confirmed the presence of aromatic ring in the β-CD cavity but whether one of the two or both rings was not clear. Molecular mechanics and molecular dynamic calculations showed the entry of fluoro-ring from wider side of β-CD cavity as the most favored mode of inclusion. Minimum energy computational models were analyzed for their accuracy in atomic coordinates by comparison of calculated and experimental intermolecular ROESY peak intensities, which were not found in agreement. Several least energy computational models were refined and analyzed till calculated and experimental intensities were compatible. The results demonstrate that computational models of CD complexes need to be analyzed for atom-accuracy and quantitative ROESY analysis is a promising method. Moreover, the study also validates that the quantitative use of ROESY is feasible even with longer mixing times if peak intensity ratios instead of absolute intensities are used. Copyright © 2015 John Wiley & Sons, Ltd.
Thin film ferroelectric electro-optic memory
NASA Technical Reports Server (NTRS)
Thakoor, Sarita (Inventor); Thakoor, Anilkumar P. (Inventor)
1993-01-01
An electrically programmable, optically readable data or memory cell is configured from a thin film of ferroelectric material, such as PZT, sandwiched between a transparent top electrode and a bottom electrode. The output photoresponse, which may be a photocurrent or photo-emf, is a function of the product of the remanent polarization from a previously applied polarization voltage and the incident light intensity. The cell is useful for analog and digital data storage as well as opto-electric computing. The optical read operation is non-destructive of the remanent polarization. The cell provides a method for computing the product of stored data and incident optical data by applying an electrical signal to store data by polarizing the thin film ferroelectric material, and then applying an intensity modulated optical signal incident onto the thin film material to generate a photoresponse therein related to the product of the electrical and optical signals.
The Montage architecture for grid-enabled science processing of large, distributed datasets
NASA Technical Reports Server (NTRS)
Jacob, Joseph C.; Katz, Daniel S .; Prince, Thomas; Berriman, Bruce G.; Good, John C.; Laity, Anastasia C.; Deelman, Ewa; Singh, Gurmeet; Su, Mei-Hui
2004-01-01
Montage is an Earth Science Technology Office (ESTO) Computational Technologies (CT) Round III Grand Challenge investigation to deploy a portable, compute-intensive, custom astronomical image mosaicking service for the National Virtual Observatory (NVO). Although Montage is developing a compute- and data-intensive service for the astronomy community, we are also helping to address a problem that spans both Earth and Space science, namely how to efficiently access and process multi-terabyte, distributed datasets. In both communities, the datasets are massive, and are stored in distributed archives that are, in most cases, remote from the available Computational resources. Therefore, state of the art computational grid technologies are a key element of the Montage portal architecture. This paper describes the aspects of the Montage design that are applicable to both the Earth and Space science communities.
Mattfeldt, Torsten
2011-04-01
Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods. © 2010 The Authors Journal of Microscopy © 2010 Royal Microscopical Society.
Betowski, Don; Bevington, Charles; Allison, Thomas C
2016-01-19
Halogenated chemical substances are used in a broad array of applications, and new chemical substances are continually being developed and introduced into commerce. While recent research has considerably increased our understanding of the global warming potentials (GWPs) of multiple individual chemical substances, this research inevitably lags behind the development of new chemical substances. There are currently over 200 substances known to have high GWP. Evaluation of schemes to estimate radiative efficiency (RE) based on computational chemistry are useful where no measured IR spectrum is available. This study assesses the reliability of values of RE calculated using computational chemistry techniques for 235 chemical substances against the best available values. Computed vibrational frequency data is used to estimate RE values using several Pinnock-type models, and reasonable agreement with reported values is found. Significant improvement is obtained through scaling of both vibrational frequencies and intensities. The effect of varying the computational method and basis set used to calculate the frequency data is discussed. It is found that the vibrational intensities have a strong dependence on basis set and are largely responsible for differences in computed RE values.
Opportunities and challenges for the life sciences community.
Kolker, Eugene; Stewart, Elizabeth; Ozdemir, Vural
2012-03-01
Twenty-first century life sciences have transformed into data-enabled (also called data-intensive, data-driven, or big data) sciences. They principally depend on data-, computation-, and instrumentation-intensive approaches to seek comprehensive understanding of complex biological processes and systems (e.g., ecosystems, complex diseases, environmental, and health challenges). Federal agencies including the National Science Foundation (NSF) have played and continue to play an exceptional leadership role by innovatively addressing the challenges of data-enabled life sciences. Yet even more is required not only to keep up with the current developments, but also to pro-actively enable future research needs. Straightforward access to data, computing, and analysis resources will enable true democratization of research competitions; thus investigators will compete based on the merits and broader impact of their ideas and approaches rather than on the scale of their institutional resources. This is the Final Report for Data-Intensive Science Workshops DISW1 and DISW2. The first NSF-funded Data Intensive Science Workshop (DISW1, Seattle, WA, September 19-20, 2010) overviewed the status of the data-enabled life sciences and identified their challenges and opportunities. This served as a baseline for the second NSF-funded DIS workshop (DISW2, Washington, DC, May 16-17, 2011). Based on the findings of DISW2 the following overarching recommendation to the NSF was proposed: establish a community alliance to be the voice and framework of the data-enabled life sciences. After this Final Report was finished, Data-Enabled Life Sciences Alliance (DELSA, www.delsall.org ) was formed to become a Digital Commons for the life sciences community.
Opportunities and Challenges for the Life Sciences Community
Stewart, Elizabeth; Ozdemir, Vural
2012-01-01
Abstract Twenty-first century life sciences have transformed into data-enabled (also called data-intensive, data-driven, or big data) sciences. They principally depend on data-, computation-, and instrumentation-intensive approaches to seek comprehensive understanding of complex biological processes and systems (e.g., ecosystems, complex diseases, environmental, and health challenges). Federal agencies including the National Science Foundation (NSF) have played and continue to play an exceptional leadership role by innovatively addressing the challenges of data-enabled life sciences. Yet even more is required not only to keep up with the current developments, but also to pro-actively enable future research needs. Straightforward access to data, computing, and analysis resources will enable true democratization of research competitions; thus investigators will compete based on the merits and broader impact of their ideas and approaches rather than on the scale of their institutional resources. This is the Final Report for Data-Intensive Science Workshops DISW1 and DISW2. The first NSF-funded Data Intensive Science Workshop (DISW1, Seattle, WA, September 19–20, 2010) overviewed the status of the data-enabled life sciences and identified their challenges and opportunities. This served as a baseline for the second NSF-funded DIS workshop (DISW2, Washington, DC, May 16–17, 2011). Based on the findings of DISW2 the following overarching recommendation to the NSF was proposed: establish a community alliance to be the voice and framework of the data-enabled life sciences. After this Final Report was finished, Data-Enabled Life Sciences Alliance (DELSA, www.delsall.org) was formed to become a Digital Commons for the life sciences community. PMID:22401659
The importance of ray pathlengths when measuring objects in maximum intensity projection images.
Schreiner, S; Dawant, B M; Paschal, C B; Galloway, R L
1996-01-01
It is important to understand any process that affects medical data. Once the data have changed from the original form, one must consider the possibility that the information contained in the data has also changed. In general, false negative and false positive diagnoses caused by this post-processing must be minimized. Medical imaging is one area in which post-processing is commonly performed, but there is often little or no discussion of how these algorithms affect the data. This study uncovers some interesting properties of maximum intensity projection (MIP) algorithms which are commonly used in the post-processing of magnetic resonance (MR) and computed tomography (CT) angiographic data. The appearance of the width of vessels and the extent of malformations such as aneurysms is of interest to clinicians. This study will show how MIP algorithms interact with the shape of the object being projected. MIP's can make objects appear thinner in the projection than in the original data set and also alter the shape of the profile of the object seen in the original data. These effects have consequences for width-measuring algorithms which will be discussed. Each projected intensity is dependent upon the pathlength of the ray from which the projected pixel arises. The morphology (shape and intensity profile) of an object will change the pathlength that each ray experiences. This is termed the pathlength effect. In order to demonstrate the pathlength effect, simple computer models of an imaged vessel were created. Additionally, a static MR phantom verified that the derived equation for the projection-plane probability density function (pdf) predicts the projection-plane intensities well (R(2)=0.96). Finally, examples of projections through in vivo MR angiography and CT angiography data are presented.
A lightweight distributed framework for computational offloading in mobile cloud computing.
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC.
A Lightweight Distributed Framework for Computational Offloading in Mobile Cloud Computing
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC. PMID:25127245
NASA Technical Reports Server (NTRS)
Curlis, J. D.; Frost, V. S.; Dellwig, L. F.
1986-01-01
Computer-enhancement techniques applied to the SIR-A data from the Lisbon Valley area in the northern portion of the Paradox basin increased the value of the imagery in the development of geologically useful maps. The enhancement techniques include filtering to remove image speckle from the SIR-A data and combining these data with Landsat multispectral scanner data. A method well-suited for the combination of the data sets utilized a three-dimensional domain defined by intensity-hue-saturation (IHS) coordinates. Such a system allows the Landsat data to modulate image intensity, while the SIR-A data control image hue and saturation. Whereas the addition of Landsat data to the SIR-A image by means of a pixel-by-pixel ratio accentuated textural variations within the image, the addition of color to the combined images enabled isolation of areas in which gray-tone contrast was minimal. This isolation resulted in a more precise definition of stratigraphic units.
NASA Astrophysics Data System (ADS)
Brzuszek, Marcin; Daniluk, Andrzej
2006-11-01
Writing a concurrent program can be more difficult than writing a sequential program. Programmer needs to think about synchronisation, race conditions and shared variables. Transactions help reduce the inconvenience of using threads. A transaction is an abstraction, which allows programmers to group a sequence of actions on the program into a logical, higher-level computation unit. This paper presents multithreaded versions of the GROWTH program, which allow to calculate the layer coverages during the growth of thin epitaxial films and the corresponding RHEED intensities according to the kinematical approximation. The presented programs also contain graphical user interfaces, which enable displaying program data at run-time. New version program summaryTitles of programs:GROWTHGr, GROWTH06 Catalogue identifier:ADVL_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v2_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Catalogue identifier of previous version:ADVL Does the new version supersede the original program:No Computer for which the new version is designed and others on which it has been tested: Pentium-based PC Operating systems or monitors under which the new version has been tested: Windows 9x, XP, NT Programming language used:Object Pascal Memory required to execute with typical data:More than 1 MB Number of bits in a word:64 bits Number of processors used:1 No. of lines in distributed program, including test data, etc.:20 931 Number of bytes in distributed program, including test data, etc.: 1 311 268 Distribution format:tar.gz Nature of physical problem: The programs compute the RHEED intensities during the growth of thin epitaxial structures prepared using the molecular beam epitaxy (MBE). The computations are based on the use of kinematical diffraction theory [P.I. Cohen, G.S. Petrich, P.R. Pukite, G.J. Whaley, A.S. Arrott, Surf. Sci. 216 (1989) 222. [1
Fully automated motion correction in first-pass myocardial perfusion MR image sequences.
Milles, Julien; van der Geest, Rob J; Jerosch-Herold, Michael; Reiber, Johan H C; Lelieveldt, Boudewijn P F
2008-11-01
This paper presents a novel method for registration of cardiac perfusion magnetic resonance imaging (MRI). The presented method is capable of automatically registering perfusion data, using independent component analysis (ICA) to extract physiologically relevant features together with their time-intensity behavior. A time-varying reference image mimicking intensity changes in the data of interest is computed based on the results of that ICA. This reference image is used in a two-pass registration framework. Qualitative and quantitative validation of the method is carried out using 46 clinical quality, short-axis, perfusion MR datasets comprising 100 images each. Despite varying image quality and motion patterns in the evaluation set, validation of the method showed a reduction of the average right ventricle (LV) motion from 1.26+/-0.87 to 0.64+/-0.46 pixels. Time-intensity curves are also improved after registration with an average error reduced from 2.65+/-7.89% to 0.87+/-3.88% between registered data and manual gold standard. Comparison of clinically relevant parameters computed using registered data and the manual gold standard show a good agreement. Additional tests with a simulated free-breathing protocol showed robustness against considerable deviations from a standard breathing protocol. We conclude that this fully automatic ICA-based method shows an accuracy, a robustness and a computation speed adequate for use in a clinical environment.
NASA Technical Reports Server (NTRS)
Rignot, E.; Chellappa, R.
1993-01-01
We present a maximum a posteriori (MAP) classifier for classifying multifrequency, multilook, single polarization SAR intensity data into regions or ensembles of pixels of homogeneous and similar radar backscatter characteristics. A model for the prior joint distribution of the multifrequency SAR intensity data is combined with a Markov random field for representing the interactions between region labels to obtain an expression for the posterior distribution of the region labels given the multifrequency SAR observations. The maximization of the posterior distribution yields Bayes's optimum region labeling or classification of the SAR data or its MAP estimate. The performance of the MAP classifier is evaluated by using computer-simulated multilook SAR intensity data as a function of the parameters in the classification process. Multilook SAR intensity data are shown to yield higher classification accuracies than one-look SAR complex amplitude data. The MAP classifier is extended to the case in which the radar backscatter from the remotely sensed surface varies within the SAR image because of incidence angle effects. The results obtained illustrate the practicality of the method for combining SAR intensity observations acquired at two different frequencies and for improving classification accuracy of SAR data.
Accessing the public MIMIC-II intensive care relational database for clinical research.
Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G
2013-01-10
The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.
Integration of drug dosing data with physiological data streams using a cloud computing paradigm.
Bressan, Nadja; James, Andrew; McGregor, Carolyn
2013-01-01
Many drugs are used during the provision of intensive care for the preterm newborn infant. Recommendations for drug dosing in newborns depend upon data from population based pharmacokinetic research. There is a need to be able to modify drug dosing in response to the preterm infant's response to the standard dosing recommendations. The real-time integration of physiological data with drug dosing data would facilitate individualised drug dosing for these immature infants. This paper proposes the use of a novel computational framework that employs real-time, temporal data analysis for this task. Deployment of the framework within the cloud computing paradigm will enable widespread distribution of individualized drug dosing for newborn infants.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cohen, J; Dossa, D; Gokhale, M
Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
Implementing direct, spatially isolated problems on transputer networks
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
Evaluation of normalization methods for cDNA microarray data by k-NN classification
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-01-01
Background Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Results Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Conclusion Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics. PMID:16045803
Evaluation of normalization methods for cDNA microarray data by k-NN classification.
Wu, Wei; Xing, Eric P; Myers, Connie; Mian, I Saira; Bissell, Mina J
2005-07-26
Non-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification. Ten location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using NONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error. Using LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
Arithmetic Data Cube as a Data Intensive Benchmark
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabano, Leonid
2003-01-01
Data movement across computational grids and across memory hierarchy of individual grid machines is known to be a limiting factor for application involving large data sets. In this paper we introduce the Data Cube Operator on an Arithmetic Data Set which we call Arithmetic Data Cube (ADC). We propose to use the ADC to benchmark grid capabilities to handle large distributed data sets. The ADC stresses all levels of grid memory by producing 2d views of an Arithmetic Data Set of d-tuples described by a small number of parameters. We control data intensity of the ADC by controlling the sizes of the views through choice of the tuple parameters.
Enabling Earth Science: The Facilities and People of the NCCS
NASA Technical Reports Server (NTRS)
2002-01-01
The NCCS's mass data storage system allows scientists to store and manage the vast amounts of data generated by these computations, and its high-speed network connections allow the data to be accessed quickly from the NCCS archives. Some NCCS users perform studies that are directly related to their ability to run computationally expensive and data-intensive simulations. Because the number and type of questions scientists research often are limited by computing power, the NCCS continually pursues the latest technologies in computing, mass storage, and networking technologies. Just as important as the processors, tapes, and routers of the NCCS are the personnel who administer this hardware, create and manage accounts, maintain security, and assist the scientists, often working one on one with them.
Impedance computations and beam-based measurements: A problem of discrepancy
Smaluk, Victor
2018-04-21
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Impedance computations and beam-based measurements: A problem of discrepancy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smaluk, Victor
High intensity of particle beams is crucial for high-performance operation of modern electron-positron storage rings, both colliders and light sources. The beam intensity is limited by the interaction of the beam with self-induced electromagnetic fields (wake fields) proportional to the vacuum chamber impedance. For a new accelerator project, the total broadband impedance is computed by element-wise wake-field simulations using computer codes. For a machine in operation, the impedance can be measured experimentally using beam-based techniques. In this article, a comparative analysis of impedance computations and beam-based measurements is presented for 15 electron-positron storage rings. The measured data and the predictionsmore » based on the computed impedance budgets show a significant discrepancy. For this article, three possible reasons for the discrepancy are discussed: interference of the wake fields excited by a beam in adjacent components of the vacuum chamber, effect of computation mesh size, and effect of insufficient bandwidth of the computed impedance.« less
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
Metnitz, P G; Laback, P; Popow, C; Laback, O; Lenz, K; Hiesmayr, M
1995-01-01
Patient Data Management Systems (PDMS) for ICUs collect, present and store clinical data. Various intentions make analysis of those digitally stored data desirable, such as quality control or scientific purposes. The aim of the Intensive Care Data Evaluation project (ICDEV), was to provide a database tool for the analysis of data recorded at various ICUs at the University Clinics of Vienna. General Hospital of Vienna, with two different PDMSs used: CareVue 9000 (Hewlett Packard, Andover, USA) at two ICUs (one medical ICU and one neonatal ICU) and PICIS Chart+ (PICIS, Paris, France) at one Cardiothoracic ICU. CONCEPT AND METHODS: Clinically oriented analysis of the data collected in a PDMS at an ICU was the beginning of the development. After defining the database structure we established a client-server based database system under Microsoft Windows NI and developed a user friendly data quering application using Microsoft Visual C++ and Visual Basic; ICDEV was successfully installed at three different ICUs, adjustment to the different PDMS configurations were done within a few days. The database structure developed by us enables a powerful query concept representing an 'EXPERT QUESTION COMPILER' which may help to answer almost any clinical questions. Several program modules facilitate queries at the patient, group and unit level. Results from ICDEV-queries are automatically transferred to Microsoft Excel for display (in form of configurable tables and graphs) and further processing. The ICDEV concept is configurable for adjustment to different intensive care information systems and can be used to support computerized quality control. However, as long as there exists no sufficient artifact recognition or data validation software for automatically recorded patient data, the reliability of these data and their usage for computer assisted quality control remain unclear and should be further studied.
NASA Technical Reports Server (NTRS)
Brown, G. S.; Curry, W. J.
1977-01-01
The statistical error of the pointing angle estimation technique is determined as a function of the effective receiver signal to noise ratio. Other sources of error are addressed and evaluated with inadequate calibration being of major concern. The impact of pointing error on the computation of normalized surface scattering cross section (sigma) from radar and the waveform attitude induced altitude bias is considered and quantitative results are presented. Pointing angle and sigma processing algorithms are presented along with some initial data. The intensive mode clean vs. clutter AGC calibration problem is analytically resolved. The use clutter AGC data in the intensive mode is confirmed as the correct calibration set for the sigma computations.
A Rich Metadata Filesystem for Scientific Data
ERIC Educational Resources Information Center
Bui, Hoang
2012-01-01
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides…
A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks.
Yue, Kun; Fang, Qiyu; Wang, Xiaoling; Li, Jin; Liu, Weiyi
2015-12-01
Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
Data-intensive computing on numerically-insensitive supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Fasel, Patricia K; Habib, Salman
2010-12-03
With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.
Exploring quantum computing application to satellite data assimilation
NASA Astrophysics Data System (ADS)
Cheung, S.; Zhang, S. Q.
2015-12-01
This is an exploring work on potential application of quantum computing to a scientific data optimization problem. On classical computational platforms, the physical domain of a satellite data assimilation problem is represented by a discrete variable transform, and classical minimization algorithms are employed to find optimal solution of the analysis cost function. The computation becomes intensive and time-consuming when the problem involves large number of variables and data. The new quantum computer opens a very different approach both in conceptual programming and in hardware architecture for solving optimization problem. In order to explore if we can utilize the quantum computing machine architecture, we formulate a satellite data assimilation experimental case in the form of quadratic programming optimization problem. We find a transformation of the problem to map it into Quadratic Unconstrained Binary Optimization (QUBO) framework. Binary Wavelet Transform (BWT) will be applied to the data assimilation variables for its invertible decomposition and all calculations in BWT are performed by Boolean operations. The transformed problem will be experimented as to solve for a solution of QUBO instances defined on Chimera graphs of the quantum computer.
Architecture and Programming Models for High Performance Intensive Computation
2016-06-29
Applications Systems and Large-Scale-Big-Data & Large-Scale-Big-Computing (DDDAS- LS ). ICCS 2015, June 2015. Reykjavk, Ice- land. 2. Bo YT, Wang P, Guo ZL...The Mahali project,” Communications Magazine , vol. 52, pp. 111–133, Aug 2014. 14 DISTRIBUTION A: Distribution approved for public release. Response ID
Desktop Social Science: Coming of Age.
ERIC Educational Resources Information Center
Dwyer, David C.; And Others
Beginning in 1985, Apple Computer, Inc. and several school districts began a collaboration to examine the impact of intensive computer use on instruction and learning in K-12 classrooms. This paper follows the development of a Macintosh II-based management and retrieval system for text data undertaken to store and retrieve oral reflections of…
Discovering and understanding oncogenic gene fusions through data intensive computational approaches
Latysheva, Natasha S.; Babu, M. Madan
2016-01-01
Abstract Although gene fusions have been recognized as important drivers of cancer for decades, our understanding of the prevalence and function of gene fusions has been revolutionized by the rise of next-generation sequencing, advances in bioinformatics theory and an increasing capacity for large-scale computational biology. The computational work on gene fusions has been vastly diverse, and the present state of the literature is fragmented. It will be fruitful to merge three camps of gene fusion bioinformatics that appear to rarely cross over: (i) data-intensive computational work characterizing the molecular biology of gene fusions; (ii) development research on fusion detection tools, candidate fusion prioritization algorithms and dedicated fusion databases and (iii) clinical research that seeks to either therapeutically target fusion transcripts and proteins or leverages advances in detection tools to perform large-scale surveys of gene fusion landscapes in specific cancer types. In this review, we unify these different—yet highly complementary and symbiotic—approaches with the view that increased synergy will catalyze advancements in gene fusion identification, characterization and significance evaluation. PMID:27105842
BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.
Huang, Hailiang; Tata, Sandeep; Prill, Robert J
2013-01-01
Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. http://github.com/ibm-bioinformatics/bluesnp
NASA Astrophysics Data System (ADS)
Seamon, E.; Gessler, P. E.; Flathers, E.
2015-12-01
The creation and use of large amounts of data in scientific investigations has become common practice. Data collection and analysis for large scientific computing efforts are not only increasing in volume as well as number, the methods and analysis procedures are evolving toward greater complexity (Bell, 2009, Clarke, 2009, Maimon, 2010). In addition, the growth of diverse data-intensive scientific computing efforts (Soni, 2011, Turner, 2014, Wu, 2008) has demonstrated the value of supporting scientific data integration. Efforts to bridge this gap between the above perspectives have been attempted, in varying degrees, with modular scientific computing analysis regimes implemented with a modest amount of success (Perez, 2009). This constellation of effects - 1) an increasing growth in the volume and amount of data, 2) a growing data-intensive science base that has challenging needs, and 3) disparate data organization and integration efforts - has created a critical gap. Namely, systems of scientific data organization and management typically do not effectively enable integrated data collaboration or data-intensive science-based communications. Our research efforts attempt to address this gap by developing a modular technology framework for data science integration efforts - with climate variation as the focus. The intention is that this model, if successful, could be generalized to other application areas. Our research aim focused on the design and implementation of a modular, deployable technology architecture for data integration. Developed using aspects of R, interactive python, SciDB, THREDDS, Javascript, and varied data mining and machine learning techniques, the Modular Data Response Framework (MDRF) was implemented to explore case scenarios for bio-climatic variation as they relate to pacific northwest ecosystem regions. Our preliminary results, using historical NETCDF climate data for calibration purposes across the inland pacific northwest region (Abatzoglou, Brown, 2011), show clear ecosystems shifting over a ten-year period (2001-2011), based on multiple supervised classifier methods for bioclimatic indicators.
0-6767 : evaluation of existing smartphone applications and data needs for travel survey.
DOT National Transportation Integrated Search
2014-08-01
Current and reliable data on traffic movements : play a key role in transportation planning, : modeling, and air quality analysis. Traditional : travel surveys conducted via paper or computer : are costly, time consuming, and labor intensive : for su...
Bringing MapReduce Closer To Data With Active Drives
NASA Astrophysics Data System (ADS)
Golpayegani, N.; Prathapan, S.; Warmka, R.; Wyatt, B.; Halem, M.; Trantham, J. D.; Markey, C. A.
2017-12-01
Moving computation closer to the data location has been a much theorized improvement to computation for decades. The increase in processor performance, the decrease in processor size and power requirement combined with the increase in data intensive computing has created a push to move computation as close to data as possible. We will show the next logical step in this evolution in computing: moving computation directly to storage. Hypothetical systems, known as Active Drives, have been proposed as early as 1998. These Active Drives would have a general-purpose CPU on each disk allowing for computations to be performed on them without the need to transfer the data to the computer over the system bus or via a network. We will utilize Seagate's Active Drives to perform general purpose parallel computing using the MapReduce programming model directly on each drive. We will detail how the MapReduce programming model can be adapted to the Active Drive compute model to perform general purpose computing with comparable results to traditional MapReduce computations performed via Hadoop. We will show how an Active Drive based approach significantly reduces the amount of data leaving the drive when performing several common algorithms: subsetting and gridding. We will show that an Active Drive based design significantly improves data transfer speeds into and out of drives compared to Hadoop's HDFS while at the same time keeping comparable compute speeds as Hadoop.
Digital signal conditioning for flight test instrumentation
NASA Technical Reports Server (NTRS)
Bever, Glenn A.
1991-01-01
An introduction to digital measurement processes on aircraft is provided. Flight test instrumentation systems are rapidly evolving from analog-intensive to digital intensive systems, including the use of onboard digital computers. The topics include measurements that are digital in origin, as well as sampling, encoding, transmitting, and storing data. Particular emphasis is placed on modern avionic data bus architectures and what to be aware of when extracting data from them. Examples of data extraction techniques are given. Tradeoffs between digital logic families, trends in digital development, and design testing techniques are discussed. An introduction to digital filtering is also covered.
Accessing the public MIMIC-II intensive care relational database for clinical research
2013-01-01
Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652
Energy 101: Energy Efficient Data Centers
None
2018-04-16
Data centers provide mission-critical computing functions vital to the daily operation of top U.S. economic, scientific, and technological organizations. These data centers consume large amounts of energy to run and maintain their computer systems, servers, and associated high-performance componentsâup to 3% of all U.S. electricity powers data centers. And as more information comes online, data centers will consume even more energy. Data centers can become more energy efficient by incorporating features like power-saving "stand-by" modes, energy monitoring software, and efficient cooling systems instead of energy-intensive air conditioners. These and other efficiency improvements to data centers can produce significant energy savings, reduce the load on the electric grid, and help protect the nation by increasing the reliability of critical computer operations.
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
NASA Astrophysics Data System (ADS)
Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling
Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
Federated data storage and management infrastructure
NASA Astrophysics Data System (ADS)
Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.
2016-10-01
The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
Campion, Thomas R.; Waitman, Lemuel R.; May, Addison K.; Ozdas, Asli; Lorenzi, Nancy M.; Gadd, Cynthia S.
2009-01-01
Introduction: Evaluations of computerized clinical decision support systems (CDSS) typically focus on clinical performance changes and do not include social, organizational, and contextual characteristics explaining use and effectiveness. Studies of CDSS for intensive insulin therapy (IIT) are no exception, and the literature lacks an understanding of effective computer-based IIT implementation and operation. Results: This paper presents (1) a literature review of computer-based IIT evaluations through the lens of institutional theory, a discipline from sociology and organization studies, to demonstrate the inconsistent reporting of workflow and care process execution and (2) a single-site case study to illustrate how computer-based IIT requires substantial organizational change and creates additional complexity with unintended consequences including error. Discussion: Computer-based IIT requires organizational commitment and attention to site-specific technology, workflow, and care processes to achieve intensive insulin therapy goals. The complex interaction between clinicians, blood glucose testing devices, and CDSS may contribute to workflow inefficiency and error. Evaluations rarely focus on the perspective of nurses, the primary users of computer-based IIT whose knowledge can potentially lead to process and care improvements. Conclusion: This paper addresses a gap in the literature concerning the social, organizational, and contextual characteristics of CDSS in general and for intensive insulin therapy specifically. Additionally, this paper identifies areas for future research to define optimal computer-based IIT process execution: the frequency and effect of manual data entry error of blood glucose values, the frequency and effect of nurse overrides of CDSS insulin dosing recommendations, and comprehensive ethnographic study of CDSS for IIT. PMID:19815452
NASA Technical Reports Server (NTRS)
1981-01-01
Progress in the study of the intensity of the urban heat island is reported. The intensity of the heat island is commonly defined as the temperature difference between the center of the city and the surrounding suburban and rural regions. The intensity is considered as a function of changes in the season and changes in meteorological conditions in order to derive various parameters which may be used in numerical models for urban climate. Twelve case studies were selected and CCT's were ordered. In situ data was obtained from sixteen stations scattered about the city of St. Louis. Upper-air meteorological data were obtained and the water vapor and the temperature data were processed. Atmospheric transmissivities were computed for each of the case studies.
Castaño-Díez, Daniel
2017-01-01
Dynamo is a package for the processing of tomographic data. As a tool for subtomogram averaging, it includes different alignment and classification strategies. Furthermore, its data-management module allows experiments to be organized in groups of tomograms, while offering specialized three-dimensional tomographic browsers that facilitate visualization, location of regions of interest, modelling and particle extraction in complex geometries. Here, a technical description of the package is presented, focusing on its diverse strategies for optimizing computing performance. Dynamo is built upon mbtools (middle layer toolbox), a general-purpose MATLAB library for object-oriented scientific programming specifically developed to underpin Dynamo but usable as an independent tool. Its structure intertwines a flexible MATLAB codebase with precompiled C++ functions that carry the burden of numerically intensive operations. The package can be delivered as a precompiled standalone ready for execution without a MATLAB license. Multicore parallelization on a single node is directly inherited from the high-level parallelization engine provided for MATLAB, automatically imparting a balanced workload among the threads in computationally intense tasks such as alignment and classification, but also in logistic-oriented tasks such as tomogram binning and particle extraction. Dynamo supports the use of graphical processing units (GPUs), yielding considerable speedup factors both for native Dynamo procedures (such as the numerically intensive subtomogram alignment) and procedures defined by the user through its MATLAB-based GPU library for three-dimensional operations. Cloud-based virtual computing environments supplied with a pre-installed version of Dynamo can be publicly accessed through the Amazon Elastic Compute Cloud (EC2), enabling users to rent GPU computing time on a pay-as-you-go basis, thus avoiding upfront investments in hardware and longterm software maintenance. PMID:28580909
Castaño-Díez, Daniel
2017-06-01
Dynamo is a package for the processing of tomographic data. As a tool for subtomogram averaging, it includes different alignment and classification strategies. Furthermore, its data-management module allows experiments to be organized in groups of tomograms, while offering specialized three-dimensional tomographic browsers that facilitate visualization, location of regions of interest, modelling and particle extraction in complex geometries. Here, a technical description of the package is presented, focusing on its diverse strategies for optimizing computing performance. Dynamo is built upon mbtools (middle layer toolbox), a general-purpose MATLAB library for object-oriented scientific programming specifically developed to underpin Dynamo but usable as an independent tool. Its structure intertwines a flexible MATLAB codebase with precompiled C++ functions that carry the burden of numerically intensive operations. The package can be delivered as a precompiled standalone ready for execution without a MATLAB license. Multicore parallelization on a single node is directly inherited from the high-level parallelization engine provided for MATLAB, automatically imparting a balanced workload among the threads in computationally intense tasks such as alignment and classification, but also in logistic-oriented tasks such as tomogram binning and particle extraction. Dynamo supports the use of graphical processing units (GPUs), yielding considerable speedup factors both for native Dynamo procedures (such as the numerically intensive subtomogram alignment) and procedures defined by the user through its MATLAB-based GPU library for three-dimensional operations. Cloud-based virtual computing environments supplied with a pre-installed version of Dynamo can be publicly accessed through the Amazon Elastic Compute Cloud (EC2), enabling users to rent GPU computing time on a pay-as-you-go basis, thus avoiding upfront investments in hardware and longterm software maintenance.
Regional Sustainability: The San Luis Basin Metrics Project
There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute. Moreover, individual metrics may not capture all aspects of a system that are relevant to sust...
Development of a Multidisciplinary Approach to Access Sustainability
There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute the metrics. Moreover, individual metrics do not capture all aspects of a system that are relevan...
Advancing Cyberinfrastructure to support high resolution water resources modeling
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.
2012-12-01
Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.
Improved Optics For Quasi-Elastic Light Scattering
NASA Technical Reports Server (NTRS)
Cheung, Harry Michael
1995-01-01
Improved optical train devised for use in light-scattering measurements of quasi-elastic light scattering (QELS) and laser spectroscopy. Measurements performed on solutions, microemulsions, micellular solutions, and colloidal dispersions. Simultaneous measurements of total intensity and fluctuations in total intensity of light scattered from sample at various angles provides data used, in conjunction with diffusion coefficients, to compute sizes of particles in sample.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.
2008-05-04
This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroicmore » effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster.« less
System on a chip with MPEG-4 capability
NASA Astrophysics Data System (ADS)
Yassa, Fathy; Schonfeld, Dan
2002-12-01
Current products supporting video communication applications rely on existing computer architectures. RISC processors have been used successfully in numerous applications over several decades. DSP processors have become ubiquitous in signal processing and communication applications. Real-time applications such as speech processing in cellular telephony rely extensively on the computational power of these processors. Video processors designed to implement the computationally intensive codec operations have also been used to address the high demands of video communication applications (e.g., cable set-top boxes and DVDs). This paper presents an overview of a system-on-chip (SOC) architecture used for real-time video in wireless communication applications. The SOC specifications answer to the system requirements imposed by the application environment. A CAM-based video processor is used to accelerate data intensive video compression tasks such as motion estimations and filtering. Other components are dedicated to system level data processing and audio processing. A rich set of I/Os allows the SOC to communicate with other system components such as baseband and memory subsystems.
NASA Technical Reports Server (NTRS)
Anspaugh, B. E.; Miyahira, T. F.; Weiss, R. S.
1979-01-01
Computed statistical averages and standard deviations with respect to the measured cells for each intensity temperature measurement condition are presented. Display averages and standard deviations of the cell characteristics in a two dimensional array format are shown: one dimension representing incoming light intensity, and another, the cell temperature. Programs for calculating the temperature coefficients of the pertinent cell electrical parameters are presented, and postirradiation data are summarized.
Cloud-based Jupyter Notebooks for Water Data Analysis
NASA Astrophysics Data System (ADS)
Castronova, A. M.; Brazil, L.; Seul, M.
2017-12-01
The development and adoption of technologies by the water science community to improve our ability to openly collaborate and share workflows will have a transformative impact on how we address the challenges associated with collaborative and reproducible scientific research. Jupyter notebooks offer one solution by providing an open-source platform for creating metadata-rich toolchains for modeling and data analysis applications. Adoption of this technology within the water sciences, coupled with publicly available datasets from agencies such as USGS, NASA, and EPA enables researchers to easily prototype and execute data intensive toolchains. Moreover, implementing this software stack in a cloud-based environment extends its native functionality to provide researchers a mechanism to build and execute toolchains that are too large or computationally demanding for typical desktop computers. Additionally, this cloud-based solution enables scientists to disseminate data processing routines alongside journal publications in an effort to support reproducibility. For example, these data collection and analysis toolchains can be shared, archived, and published using the HydroShare platform or downloaded and executed locally to reproduce scientific analysis. This work presents the design and implementation of a cloud-based Jupyter environment and its application for collecting, aggregating, and munging various datasets in a transparent, sharable, and self-documented manner. The goals of this work are to establish a free and open source platform for domain scientists to (1) conduct data intensive and computationally intensive collaborative research, (2) utilize high performance libraries, models, and routines within a pre-configured cloud environment, and (3) enable dissemination of research products. This presentation will discuss recent efforts towards achieving these goals, and describe the architectural design of the notebook server in an effort to support collaborative and reproducible science.
Genomic cloud computing: legal and ethical points to consider
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Burton, Paul; Chisholm, Rex; Fortier, Isabel; Goodwin, Pat; Harris, Jennifer; Hveem, Kristian; Kaye, Jane; Kent, Alistair; Knoppers, Bartha Maria; Lindpaintner, Klaus; Little, Julian; Riegman, Peter; Ripatti, Samuli; Stolk, Ronald; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn; Joly, Yann; Kato, Kazuto; Knoppers, Bartha Maria; Rodriguez, Laura Lyman; McPherson, Treasa; Nicolás, Pilar; Ouellette, Francis; Romeo-Casabona, Carlos; Sarin, Rajiv; Wallace, Susan; Wiesner, Georgia; Wilson, Julia; Zeps, Nikolajs; Simkevitz, Howard; De Rienzo, Assunta; Knoppers, Bartha M
2015-01-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key ‘points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These ‘points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure. PMID:25248396
Genomic cloud computing: legal and ethical points to consider.
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Knoppers, Bartha M
2015-10-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key 'points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These 'points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure.
Diversity in computing technologies and strategies for dynamic resource allocation
Garzoglio, G.; Gutsche, O.
2015-12-23
Here, High Energy Physics (HEP) is a very data intensive and trivially parallelizable science discipline. HEP is probing nature at increasingly finer details requiring ever increasing computational resources to process and analyze experimental data. In this paper, we discuss how HEP provisioned resources so far using Grid technologies, how HEP is starting to include new resource providers like commercial Clouds and HPC installations, and how HEP is transparently provisioning resources at these diverse providers.
ESnet: Large-Scale Science and Data Management ( (LBNL Summer Lecture Series)
Johnston, Bill
2017-12-09
Summer Lecture Series 2004: Bill Johnston of Berkeley Lab's Computing Sciences is a distinguished networking and computing researcher. He managed the Energy Sciences Network (ESnet), a leading-edge, high-bandwidth network funded by DOE's Office of Science. Used for everything from videoconferencing to climate modeling, and flexible enough to accommodate a wide variety of data-intensive applications and services, ESNet's traffic volume is doubling every year and currently surpasses 200 terabytes per month.
Evolution of the ATLAS PanDA workload management system for exascale computational science
NASA Astrophysics Data System (ADS)
Maeno, T.; De, K.; Klimentov, A.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.; Yu, D.; Atlas Collaboration
2014-06-01
An important foundation underlying the impressive success of data processing and analysis in the ATLAS experiment [1] at the LHC [2] is the Production and Distributed Analysis (PanDA) workload management system [3]. PanDA was designed specifically for ATLAS and proved to be highly successful in meeting all the distributed computing needs of the experiment. However, the core design of PanDA is not experiment specific. The PanDA workload management system is capable of meeting the needs of other data intensive scientific applications. Alpha-Magnetic Spectrometer [4], an astro-particle experiment on the International Space Station, and the Compact Muon Solenoid [5], an LHC experiment, have successfully evaluated PanDA and are pursuing its adoption. In this paper, a description of the new program of work to develop a generic version of PanDA will be given, as well as the progress in extending PanDA's capabilities to support supercomputers and clouds and to leverage intelligent networking. PanDA has demonstrated at a very large scale the value of automated dynamic brokering of diverse workloads across distributed computing resources. The next generation of PanDA will allow other data-intensive sciences and a wider exascale community employing a variety of computing platforms to benefit from ATLAS' experience and proven tools.
Skills and Knowledge for Data-Intensive Environmental Research
Hampton, Stephanie E.; Jones, Matthew B.; Wasser, Leah A.; Schildhauer, Mark P.; Supp, Sarah R.; Brun, Julien; Hernandez, Rebecca R.; Boettiger, Carl; Collins, Scott L.; Gross, Louis J.; Fernández, Denny S.; Budden, Amber; White, Ethan P.; Teal, Tracy K.; Aukema, Juliann E.
2017-01-01
Abstract The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap. PMID:28584342
Skills and Knowledge for Data-Intensive Environmental Research.
Hampton, Stephanie E; Jones, Matthew B; Wasser, Leah A; Schildhauer, Mark P; Supp, Sarah R; Brun, Julien; Hernandez, Rebecca R; Boettiger, Carl; Collins, Scott L; Gross, Louis J; Fernández, Denny S; Budden, Amber; White, Ethan P; Teal, Tracy K; Labou, Stephanie G; Aukema, Juliann E
2017-06-01
The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap.
What is Data-Intensive Science?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Critchlow, Terence J.; Kleese van Dam, Kerstin
2013-06-03
What is Data Intensive Science? Today we are living in a digital world, where scientists often no longer interact directly with the physical object of their research, but do so via digitally captured, reduced, calibrated, analyzed, synthesized and, at times, visualized data. Advances in experimental and computational technologies have lead to an exponential growth in the volumes, variety and complexity of this data and while the deluge is not happening everywhere in an absolute sense, it is in a relative one. Science today is data intensive. Data intensive science has the potential to transform not only how we do science,more » but how quickly we can translate scientific progress into complete solutions, policies, decisions and ultimately economic success. Critically, data intensive science touches some of the most important challenges we are facing. Consider a few of the grand challenges outlined by the U.S. National Academy of Engineering: make solar energy economical, provide energy from fusion, develop carbon sequestration methods, advance health informatics, engineer better medicines, secure cyberspace, and engineer the tools of scientific discovery. Arguably, meeting any of these challenges requires the collaborative effort of trans-disciplinary teams, but also significant contributions from enabling data intensive technologies. Indeed for many of them, advances in data intensive research will be the single most important factor in developing successful and timely solutions. Simple extrapolations of how we currently interact with and utilize data and knowledge are not sufficient to meet this need. Given the importance of these challenges, a new, bold vision for the role of data in science, and indeed how research will be conducted in a data intensive environment is evolving.« less
Contextual classification of multispectral image data: Approximate algorithm
NASA Technical Reports Server (NTRS)
Tilton, J. C. (Principal Investigator)
1980-01-01
An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.
Regional sustainable environmental management: sustainability metrics research for decision makers
There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute. Moreover, individual metrics may not capture all aspects of a system that are relevant to sust...
Development of a multidisciplinary approach to assess regional sustainability
There are a number of established, scientifically supported metrics of sustainability. Many of the metrics are data intensive and require extensive effort to collect data and compute the metrics. Moreover, individual metrics do not capture all aspects of a system that are relev...
A Bayesian and Physics-Based Ground Motion Parameters Map Generation System
NASA Astrophysics Data System (ADS)
Ramirez-Guzman, L.; Quiroz, A.; Sandoval, H.; Perez-Yanez, C.; Ruiz, A. L.; Delgado, R.; Macias, M. A.; Alcántara, L.
2014-12-01
We present the Ground Motion Parameters Map Generation (GMPMG) system developed by the Institute of Engineering at the National Autonomous University of Mexico (UNAM). The system delivers estimates of information associated with the social impact of earthquakes, engineering ground motion parameters (gmp), and macroseismic intensity maps. The gmp calculated are peak ground acceleration and velocity (pga and pgv) and response spectral acceleration (SA). The GMPMG relies on real-time data received from strong ground motion stations belonging to UNAM's networks throughout Mexico. Data are gathered via satellite and internet service providers, and managed with the data acquisition software Earthworm. The system is self-contained and can perform all calculations required for estimating gmp and intensity maps due to earthquakes, automatically or manually. An initial data processing, by baseline correcting and removing records containing glitches or low signal-to-noise ratio, is performed. The system then assigns a hypocentral location using first arrivals and a simplified 3D model, followed by a moment tensor inversion, which is performed using a pre-calculated Receiver Green's Tensors (RGT) database for a realistic 3D model of Mexico. A backup system to compute epicentral location and magnitude is in place. A Bayesian Kriging is employed to combine recorded values with grids of computed gmp. The latter are obtained by using appropriate ground motion prediction equations (for pgv, pga and SA with T=0.3, 0.5, 1 and 1.5 s ) and numerical simulations performed in real time, using the aforementioned RGT database (for SA with T=2, 2.5 and 3 s). Estimated intensity maps are then computed using SA(T=2S) to Modified Mercalli Intensity correlations derived for central Mexico. The maps are made available to the institutions in charge of the disaster prevention systems. In order to analyze the accuracy of the maps, we compare them against observations not considered in the computations, and present some examples of recent earthquakes. We conclude that the system provides information with a fair goodness-of-fit against observations. This project is partially supported by DGAPA-PAPIIT (UNAM) project TB100313-RR170313.
Dynamic array processing for computationally intensive expert systems in CLIPS
NASA Technical Reports Server (NTRS)
Athavale, N. N.; Ragade, R. K.; Fenske, T. E.; Cassaro, M. A.
1990-01-01
This paper puts forth an architecture for implementing a loop for advanced data structure of arrays in CLIPS. An attempt is made to use multi-field variables in such an architecture to process a set of data during the decision making cycle. Also, current limitations on the expert system shells are discussed in brief in this paper. The resulting architecture is designed to circumvent the current limitations set by the expert system shell and also by the operating environment. Such advanced data structures are needed for tightly coupling symbolic and numeric computation modules.
Deformable registration of CT and cone-beam CT with local intensity matching.
Park, Seyoun; Plishker, William; Quon, Harry; Wong, John; Shekhar, Raj; Lee, Junghoon
2017-02-07
Cone-beam CT (CBCT) is a widely used intra-operative imaging modality in image-guided radiotherapy and surgery. A short scan followed by a filtered-backprojection is typically used for CBCT reconstruction. While data on the mid-plane (plane of source-detector rotation) is complete, off-mid-planes undergo different information deficiency and the computed reconstructions are approximate. This causes different reconstruction artifacts at off-mid-planes depending on slice locations, and therefore impedes accurate registration between CT and CBCT. In this paper, we propose a method to accurately register CT and CBCT by iteratively matching local CT and CBCT intensities. We correct CBCT intensities by matching local intensity histograms slice by slice in conjunction with intensity-based deformable registration. The correction-registration steps are repeated in an alternating way until the result image converges. We integrate the intensity matching into three different deformable registration methods, B-spline, demons, and optical flow that are widely used for CT-CBCT registration. All three registration methods were implemented on a graphics processing unit for efficient parallel computation. We tested the proposed methods on twenty five head and neck cancer cases and compared the performance with state-of-the-art registration methods. Normalized cross correlation (NCC), structural similarity index (SSIM), and target registration error (TRE) were computed to evaluate the registration performance. Our method produced overall NCC of 0.96, SSIM of 0.94, and TRE of 2.26 → 2.27 mm, outperforming existing methods by 9%, 12%, and 27%, respectively. Experimental results also show that our method performs consistently and is more accurate than existing algorithms, and also computationally efficient.
Deformable registration of CT and cone-beam CT with local intensity matching
NASA Astrophysics Data System (ADS)
Park, Seyoun; Plishker, William; Quon, Harry; Wong, John; Shekhar, Raj; Lee, Junghoon
2017-02-01
Cone-beam CT (CBCT) is a widely used intra-operative imaging modality in image-guided radiotherapy and surgery. A short scan followed by a filtered-backprojection is typically used for CBCT reconstruction. While data on the mid-plane (plane of source-detector rotation) is complete, off-mid-planes undergo different information deficiency and the computed reconstructions are approximate. This causes different reconstruction artifacts at off-mid-planes depending on slice locations, and therefore impedes accurate registration between CT and CBCT. In this paper, we propose a method to accurately register CT and CBCT by iteratively matching local CT and CBCT intensities. We correct CBCT intensities by matching local intensity histograms slice by slice in conjunction with intensity-based deformable registration. The correction-registration steps are repeated in an alternating way until the result image converges. We integrate the intensity matching into three different deformable registration methods, B-spline, demons, and optical flow that are widely used for CT-CBCT registration. All three registration methods were implemented on a graphics processing unit for efficient parallel computation. We tested the proposed methods on twenty five head and neck cancer cases and compared the performance with state-of-the-art registration methods. Normalized cross correlation (NCC), structural similarity index (SSIM), and target registration error (TRE) were computed to evaluate the registration performance. Our method produced overall NCC of 0.96, SSIM of 0.94, and TRE of 2.26 → 2.27 mm, outperforming existing methods by 9%, 12%, and 27%, respectively. Experimental results also show that our method performs consistently and is more accurate than existing algorithms, and also computationally efficient.
A revised ground-motion and intensity interpolation scheme for shakemap
Worden, C.B.; Wald, D.J.; Allen, T.I.; Lin, K.; Garcia, D.; Cua, G.
2010-01-01
We describe a weighted-average approach for incorporating various types of data (observed peak ground motions and intensities and estimates from groundmotion prediction equations) into the ShakeMap ground motion and intensity mapping framework. This approach represents a fundamental revision of our existing ShakeMap methodology. In addition, the increased availability of near-real-time macroseismic intensity data, the development of newrelationships between intensity and peak ground motions, and new relationships to directly predict intensity from earthquake source information have facilitated the inclusion of intensity measurements directly into ShakeMap computations. Our approach allows for the combination of (1) direct observations (ground-motion measurements or reported intensities), (2) observations converted from intensity to ground motion (or vice versa), and (3) estimated ground motions and intensities from prediction equations or numerical models. Critically, each of the aforementioned data types must include an estimate of its uncertainties, including those caused by scaling the influence of observations to surrounding grid points and those associated with estimates given an unknown fault geometry. The ShakeMap ground-motion and intensity estimates are an uncertainty-weighted combination of these various data and estimates. A natural by-product of this interpolation process is an estimate of total uncertainty at each point on the map, which can be vital for comprehensive inventory loss calculations. We perform a number of tests to validate this new methodology and find that it produces a substantial improvement in the accuracy of ground-motion predictions over empirical prediction equations alone.
High Resolution Nature Runs and the Big Data Challenge
NASA Technical Reports Server (NTRS)
Webster, W. Phillip; Duffy, Daniel Q.
2015-01-01
NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Computer program for determining rotational line intensity factors for diatomic molecules
NASA Technical Reports Server (NTRS)
Whiting, E. E.
1973-01-01
A FORTRAN IV computer program, that provides a new research tool for determining reliable rotational line intensity factors (also known as Honl-London factors), for most electric and magnetic dipole allowed diatomic transitions, is described in detail. This users manual includes instructions for preparing the input data, a program listing, detailed flow charts, and three sample cases. The program is applicable to spin-allowed dipole transitions with either or both states intermediate between Hund's case (a) and Hund's case (b) coupling and to spin-forbidden dipole transitions with either or both states intermediate between Hund's case (c) and Hund's case (b) coupling.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data
NASA Astrophysics Data System (ADS)
Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.
2016-12-01
Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
Exploring Cloud Computing for Large-scale Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Guang; Han, Binh; Yin, Jian
This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Transformation of OODT CAS to Perform Larger Tasks
NASA Technical Reports Server (NTRS)
Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean
2008-01-01
A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.
Charalambous, Charalambos C; Alcantara, Carolina C; French, Margaret A; Li, Xin; Matt, Kathleen S; Kim, Hyosub E; Morton, Susanne M; Reisman, Darcy S
2018-05-15
Previous work demonstrated an effect of a single high-intensity exercise bout coupled with motor practice on the retention of a newly acquired skilled arm movement, in both neurologically intact and impaired adults. In the present study, using behavioural and computational analyses we demonstrated that a single exercise bout, regardless of its intensity and timing, did not increase the retention of a novel locomotor task after stroke. Considering both present and previous work, we postulate that the benefits of exercise effect may depend on the type of motor learning (e.g. skill learning, sensorimotor adaptation) and/or task (e.g. arm accuracy-tracking task, walking). Acute high-intensity exercise coupled with motor practice improves the retention of motor learning in neurologically intact adults. However, whether exercise could improve the retention of locomotor learning after stroke is still unknown. Here, we investigated the effect of exercise intensity and timing on the retention of a novel locomotor learning task (i.e. split-belt treadmill walking) after stroke. Thirty-seven people post stroke participated in two sessions, 24 h apart, and were allocated to active control (CON), treadmill walking (TMW), or total body exercise on a cycle ergometer (TBE). In session 1, all groups exercised for a short bout (∼5 min) at low (CON) or high (TMW and TBE) intensity and before (CON and TMW) or after (TBE) the locomotor learning task. In both sessions, the locomotor learning task was to walk on a split-belt treadmill in a 2:1 speed ratio (100% and 50% fast-comfortable walking speed) for 15 min. To test the effect of exercise on 24 h retention, we applied behavioural and computational analyses. Behavioural data showed that neither high-intensity group showed greater 24 h retention compared to CON, and computational data showed that 24 h retention was attributable to a slow learning process for sensorimotor adaptation. Our findings demonstrated that acute exercise coupled with a locomotor adaptation task, regardless of its intensity and timing, does not improve retention of the novel locomotor task after stroke. We postulate that exercise effects on motor learning may be context specific (e.g. type of motor learning and/or task) and interact with the presence of genetic variant (BDNF Val66Met). © 2018 The Authors. The Journal of Physiology © 2018 The Physiological Society.
Statistical methods and computing for big data.
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing; Yan, Jun
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Streaming support for data intensive cloud-based sequence analysis.
Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Sensor network based vehicle classification and license plate identification system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frigo, Janette Rose; Brennan, Sean M; Rosten, Edward J
Typically, for energy efficiency and scalability purposes, sensor networks have been used in the context of environmental and traffic monitoring applications in which operations at the sensor level are not computationally intensive. But increasingly, sensor network applications require data and compute intensive sensors such video cameras and microphones. In this paper, we describe the design and implementation of two such systems: a vehicle classifier based on acoustic signals and a license plate identification system using a camera. The systems are implemented in an energy-efficient manner to the extent possible using commercially available hardware, the Mica motes and the Stargate platform.more » Our experience in designing these systems leads us to consider an alternate more flexible, modular, low-power mote architecture that uses a combination of FPGAs, specialized embedded processing units and sensor data acquisition systems.« less
Campion, Thomas R; Waitman, Lemuel R; May, Addison K; Ozdas, Asli; Lorenzi, Nancy M; Gadd, Cynthia S
2010-01-01
Evaluations of computerized clinical decision support systems (CDSS) typically focus on clinical performance changes and do not include social, organizational, and contextual characteristics explaining use and effectiveness. Studies of CDSS for intensive insulin therapy (IIT) are no exception, and the literature lacks an understanding of effective computer-based IIT implementation and operation. This paper presents (1) a literature review of computer-based IIT evaluations through the lens of institutional theory, a discipline from sociology and organization studies, to demonstrate the inconsistent reporting of workflow and care process execution and (2) a single-site case study to illustrate how computer-based IIT requires substantial organizational change and creates additional complexity with unintended consequences including error. Computer-based IIT requires organizational commitment and attention to site-specific technology, workflow, and care processes to achieve intensive insulin therapy goals. The complex interaction between clinicians, blood glucose testing devices, and CDSS may contribute to workflow inefficiency and error. Evaluations rarely focus on the perspective of nurses, the primary users of computer-based IIT whose knowledge can potentially lead to process and care improvements. This paper addresses a gap in the literature concerning the social, organizational, and contextual characteristics of CDSS in general and for intensive insulin therapy specifically. Additionally, this paper identifies areas for future research to define optimal computer-based IIT process execution: the frequency and effect of manual data entry error of blood glucose values, the frequency and effect of nurse overrides of CDSS insulin dosing recommendations, and comprehensive ethnographic study of CDSS for IIT. Copyright (c) 2009. Published by Elsevier Ireland Ltd.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Computer simulation of reconstructed image for computer-generated holograms
NASA Astrophysics Data System (ADS)
Yasuda, Tomoki; Kitamura, Mitsuru; Watanabe, Masachika; Tsumuta, Masato; Yamaguchi, Takeshi; Yoshikawa, Hiroshi
2009-02-01
This report presents the results of computer simulation images for image-type Computer-Generated Holograms (CGHs) observable under white light fabricated with an electron beam lithography system. The simulated image is obtained by calculating wavelength and intensity of diffracted light traveling toward the viewing point from the CGH. Wavelength and intensity of the diffracted light are calculated using FFT image generated from interference fringe data. Parallax image of CGH corresponding to the viewing point can be easily obtained using this simulation method. Simulated image from interference fringe data was compared with reconstructed image of real CGH with an Electron Beam (EB) lithography system. According to the result, the simulated image resembled the reconstructed image of the CGH closely in shape, parallax, coloring and shade. And, in accordance with the shape of the light sources the simulated images which were changed in chroma saturation and blur by using two kinds of simulations: the several light sources method and smoothing method. In addition, as the applications of the CGH, full-color CGH and CGH with multiple images were simulated. The result was that the simulated images of those CGHs closely resembled the reconstructed image of real CGHs.
NASA Astrophysics Data System (ADS)
Vilotte, Jean-Pierre; Atkinson, Malcolm; Carpené, Michele; Casarotti, Emanuele; Frank, Anton; Igel, Heiner; Rietbrock, Andreas; Schwichtenberg, Horst; Spinuso, Alessandro
2016-04-01
Seismology pioneers global and open-data access -- with internationally approved data, metadata and exchange standards facilitated worldwide by the Federation of Digital Seismic Networks (FDSN) and in Europe the European Integrated Data Archives (EIDA). The growing wealth of data generated by dense observation and monitoring systems and recent advances in seismic wave simulation capabilities induces a change in paradigm. Data-intensive seismology research requires a new holistic approach combining scalable high-performance wave simulation codes and statistical data analysis methods, and integrating distributed data and computing resources. The European E-Infrastructure project "Virtual Earthquake and seismology Research Community e-science environment in Europe" (VERCE) pioneers the federation of autonomous organisations providing data and computing resources, together with a comprehensive, integrated and operational virtual research environment (VRE) and E-infrastructure devoted to the full path of data use in a research-driven context. VERCE delivers to a broad base of seismology researchers in Europe easily used high-performance full waveform simulations and misfit calculations, together with a data-intensive framework for the collaborative development of innovative statistical data analysis methods, all of which were previously only accessible to a small number of well-resourced groups. It balances flexibility with new integrated capabilities to provide a fluent path from research innovation to production. As such, VERCE is a major contribution to the implementation phase of the ``European Plate Observatory System'' (EPOS), the ESFRI initiative of the solid-Earth community. The VRE meets a range of seismic research needs by eliminating chores and technical difficulties to allow users to focus on their research questions. It empowers researchers to harvest the new opportunities provided by well-established and mature high-performance wave simulation codes of the community. It enables active researchers to invent and refine scalable methods for innovative statistical analysis of seismic waveforms in a wide range of application contexts. The VRE paves the way towards a flexible shared framework for seismic waveform inversion, lowering the barriers to uptake for the next generation of researchers. The VRE can be accessed through the science gateway that puts together computational and data-intensive research into the same framework, integrating multiple data sources and services. It provides a context for task-oriented and data-streaming workflows, and maps user actions to the full gamut of the federated platform resources and procurement policies, activating the necessary behind-the-scene automation and transformation. The platform manages and produces domain metadata, coupling them with the provenance information describing the relationships and the dependencies, which characterise the whole workflow process. This dynamic knowledge base, can be explored for validation purposes via a graphical interface and a web API. Moreover, it fosters the assisted selection and re-use of the data within each phase of the scientific analysis. These phases can be identified as Simulation, Data Access, Preprocessing, Misfit and data processing, and are presented to the users of the gateway as dedicated and interactive workspaces. By enabling researchers to share results and provenance information, VERCE steers open-science behaviour, allowing researchers to discover and build on prior work and thereby to progress faster. A key asset is the agile strategy that VERCE deployed in a multi-organisational context, engaging seismologists, data scientists, ICT researchers, HPC and data resource providers, system administrators into short-lived tasks each with a goal that is a seismology priority, and intimately coupling research thinking with technical innovation. This changes the focus from HPC production environments and community data services to user-focused scenario, avoiding wasteful bouts of technology centricity where technologists collect requirements and develop a system that is not used because the ideas of the planned users have moved on. As such the technologies and concepts developed in VERCE are relevant to many other disciplines in computational and data driven Earth Sciences and can provide the key technologies for a European wide computational and data intensive framework in Earth Sciences.
A parallel-processing approach to computing for the geographic sciences
Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark
2001-01-01
The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing.
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles; Mousses, Spyro
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information.
An independent software system for the analysis of dynamic MR images.
Torheim, G; Lombardi, M; Rinck, P A
1997-01-01
A computer system for the manual, semi-automatic, and automatic analysis of dynamic MR images was to be developed on UNIX and personal computer platforms. The system was to offer an integrated and standardized way of performing both image processing and analysis that was independent of the MR unit used. The system consists of modules that are easily adaptable to special needs. Data from MR units or other diagnostic imaging equipment in techniques such as CT, ultrasonography, or nuclear medicine can be processed through the ACR-NEMA/DICOM standard file formats. A full set of functions is available, among them cine-loop visual analysis, and generation of time-intensity curves. Parameters such as cross-correlation coefficients, area under the curve, peak/maximum intensity, wash-in and wash-out slopes, time to peak, and relative signal intensity/contrast enhancement can be calculated. Other parameters can be extracted by fitting functions like the gamma-variate function. Region-of-interest data and parametric values can easily be exported. The system has been successfully tested in animal and patient examinations.
Three-dimensional spiral CT during arterial portography: comparison of three rendering techniques.
Heath, D G; Soyer, P A; Kuszyk, B S; Bliss, D F; Calhoun, P S; Bluemke, D A; Choti, M A; Fishman, E K
1995-07-01
The three most common techniques for three-dimensional reconstruction are surface rendering, maximum-intensity projection (MIP), and volume rendering. Surface-rendering algorithms model objects as collections of geometric primitives that are displayed with surface shading. The MIP algorithm renders an image by selecting the voxel with the maximum intensity signal along a line extended from the viewer's eye through the data volume. Volume-rendering algorithms sum the weighted contributions of all voxels along the line. Each technique has advantages and shortcomings that must be considered during selection of one for a specific clinical problem and during interpretation of the resulting images. With surface rendering, sharp-edged, clear three-dimensional reconstruction can be completed on modest computer systems; however, overlapping structures cannot be visualized and artifacts are a problem. MIP is computationally a fast technique, but it does not allow depiction of overlapping structures, and its images are three-dimensionally ambiguous unless depth cues are provided. Both surface rendering and MIP use less than 10% of the image data. In contrast, volume rendering uses nearly all of the data, allows demonstration of overlapping structures, and engenders few artifacts, but it requires substantially more computer power than the other techniques.
NASA Astrophysics Data System (ADS)
Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.
2017-07-01
Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.
The challenge of assessing the potential developmental health risks for the tens of thousands of environmental chemicals is beyond the capacity for resource-intensive animal protocols. Large data streams coming from high-throughput (HTS) and high-content (HCS) profiling of biolog...
Shi, Yulin; Veidenbaum, Alexander V; Nicolau, Alex; Xu, Xiangmin
2015-01-15
Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post hoc processing and analysis. Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22× speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. Copyright © 2014 Elsevier B.V. All rights reserved.
Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin
2014-01-01
Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633
NASA Astrophysics Data System (ADS)
Trani, L.; Spinuso, A.; Galea, M.; Atkinson, M.; Van Eck, T.; Vilotte, J.
2011-12-01
The data bonanza generated by today's digital revolution is forcing scientists to rethink their methodologies and working practices. Traditional approaches to knowledge discovery are pushed to their limit and struggle to keep apace with the data flows produced by modern systems. This work shows how the ADMIRE data-intensive architecture supports seismologists by enabling them to focus on their scientific goals and questions, abstracting away the underlying technology platform that enacts their data integration and analysis tasks. ADMIRE accomplishes this partly by recognizing 3 different types of experts that require clearly defined interfaces between their interaction: the domain expert who is the application specialist, the data-analysis expert who is a specialist in extracting information from data, and the data-intensive engineer who develops the infrastructure for data-intensive computation. In order to provide a context in which each category of expert may flourish, ADMIRE uses a 3-level architecture: the upper - tool - level supports the work of both domain and data-analysis experts, housing an extensive and evolving set of portals, tools and development environments; the lower - enactment - level houses a large and dynamic community of providers delivering data and data-intensive enactment environments as an evolving infrastructure that supports all of the work underway in the upper layer. Most data-intensive engineers work here; the crucial innovation lies in the middle level, a gateway that is a tightly defined and stable interface through which the two diverse and dynamic upper and lower layers communicate. This is a minimal and simple protocol and language (DISPEL), ultimately to be controlled by standards, so that the upper and lower communities may invest, secure in the knowledge that changes in this interface will be carefully managed. We implemented a well-established procedure for processing seismic ambient noise on the prototype architecture. The primary goal was to evaluate its capabilities for large-scale integration and analysis of distributed data. A secondary goal was to gauge its potential and the added value that it might bring to the seismological community. Though still in its infant state, the architecture met the demands of our use case and promises to cater for our future requirements. We shall continue to develop its capabilities as part of an EU funded project VERCE - Virtual Earthquake and Seismology Research Community for Europe. VERCE aims to significantly advance our understanding of the Earth in order to aid society in its management of natural resources and hazards. Its strategy is to enable seismologists to fully exploit the under-utilized wealth of seismic data, and key to this is a data-intensive computation framework adapted to the scale and diversity of the community. This is a first step in building a data-intensive highway for geoscientists, smoothing their travel from the primary sources of data to new insights and rapid delivery of actionable information.
Long live the Data Scientist, but can he/she persist?
NASA Astrophysics Data System (ADS)
Wyborn, L. A.
2011-12-01
In recent years the fourth paradigm of data intensive science has slowly taken hold as the increased capacity of instruments and an increasing number of instruments (in particular sensor networks) have changed how fundamental research is undertaken. Most modern scientific research is about digital capture of data direct from instruments, processing it by computers, storing the results on computers and only publishing a small fraction of data in hard copy publications. At the same time, the rapid increase in capacity of supercomputers, particularly at petascale, means that far larger data sets can be analysed and to greater resolution than previously possible. The new cloud computing paradigm which allows distributed data, software and compute resources to be linked by seamless workflows, is creating new opportunities in processing of high volumes of data to an increasingly larger number of researchers. However, to take full advantage of these compute resources, data sets for analysis have to be aggregated from multiple sources to create high performance data sets. These new technology developments require that scientists must become more skilled in data management and/or have a higher degree of computer literacy. In almost every science discipline there is now an X-informatics branch and a computational X branch (eg, Geoinformatics and Computational Geoscience): both require a new breed of researcher that has skills in both the science fundamentals and also knowledge of some ICT aspects (computer programming, data base design and development, data curation, software engineering). People that can operate in both science and ICT are increasingly known as 'data scientists'. Data scientists are a critical element of many large scale earth and space science informatics projects, particularly those that are tackling current grand challenges at an international level on issues such as climate change, hazard prediction and sustainable development of our natural resources. These projects by their very nature require the integration of multiple digital data sets from multiple sources. Often the preparation of the data for computational analysis can take months and requires painstaking attention to detail to ensure that anomalies identified are real and are not just artefacts of the data preparation and/or the computational analysis. Although data scientists are increasingly vital to successful data intensive earth and space science projects, unless they are recognised for their capabilities in both the science and the computational domains they are likely to migrate to either a science role or an ICT role as their career advances. Most reward and recognition systems do not recognise those with skills in both, hence, getting trained data scientists to persist beyond one or two projects can be challenge. Those data scientists that persist in the profession are characteristically committed and enthusiastic people who have the support of their organisations to take on this role. They also tend to be people who share developments and are critical to the success of the open source software movement. However, the fact remains that survival of the data scientist as a species is being threatened unless something is done to recognise their invaluable contributions to the new fourth paradigm of science.
Hsieh, Hong-Po; Ko, Fan-Hua; Sung, Kung-Bin
2018-04-20
An iterative curve fitting method has been applied in both simulation [J. Biomed. Opt.17, 107003 (2012)JBOPFO1083-366810.1117/1.JBO.17.10.107003] and phantom [J. Biomed. Opt.19, 077002 (2014)JBOPFO1083-366810.1117/1.JBO.19.7.077002] studies to accurately extract optical properties and the top layer thickness of a two-layered superficial tissue model from diffuse reflectance spectroscopy (DRS) data. This paper describes a hybrid two-step parameter estimation procedure to address two main issues of the previous method, including (1) high computational intensity and (2) converging to local minima. The parameter estimation procedure contained a novel initial estimation step to obtain an initial guess, which was used by a subsequent iterative fitting step to optimize the parameter estimation. A lookup table was used in both steps to quickly obtain reflectance spectra and reduce computational intensity. On simulated DRS data, the proposed parameter estimation procedure achieved high estimation accuracy and a 95% reduction of computational time compared to previous studies. Furthermore, the proposed initial estimation step led to better convergence of the following fitting step. Strategies used in the proposed procedure could benefit both the modeling and experimental data processing of not only DRS but also related approaches such as near-infrared spectroscopy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.
We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less
MeDICi Software Superglue for Data Analysis Pipelines
Ian Gorton
2017-12-09
The Middleware for Data-Intensive Computing (MeDICi) Integration Framework is an integrated middleware platform developed to solve data analysis and processing needs of scientists across many domains. MeDICi is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms, and in use today by PNNL scientists for bioinformatics, power grid failure analysis, and text analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nikolic, R J
This month's issue has the following articles: (1) Dawn of a New Era of Scientific Discovery - Commentary by Edward I. Moses; (2) At the Frontiers of Fundamental Science Research - Collaborators from national laboratories, universities, and international organizations are using the National Ignition Facility to probe key fundamental science questions; (3) Livermore Responds to Crisis in Post-Earthquake Japan - More than 70 Laboratory scientists provided round-the-clock expertise in radionuclide analysis and atmospheric dispersion modeling as part of the nation's support to Japan following the March 2011 earthquake and nuclear accident; (4) A Comprehensive Resource for Modeling, Simulation, and Experimentsmore » - A new Web-based resource called MIDAS is a central repository for material properties, experimental data, and computer models; and (5) Finding Data Needles in Gigabit Haystacks - Livermore computer scientists have developed a novel computer architecture based on 'persistent' memory to ease data-intensive computations.« less
Building a Data Science capability for USGS water research and communication
NASA Astrophysics Data System (ADS)
Appling, A.; Read, E. K.
2015-12-01
Interpreting and communicating water issues in an era of exponentially increasing information requires a blend of domain expertise, computational proficiency, and communication skills. The USGS Office of Water Information has established a Data Science team to meet these needs, providing challenging careers for diverse domain scientists and innovators in the fields of information technology and data visualization. Here, we detail the experience of building a Data Science capability as a bridging element between traditional water resources analyses and modern computing tools and data management techniques. This approach includes four major components: 1) building reusable research tools, 2) documenting data-intensive research approaches in peer reviewed journals, 3) communicating complex water resources issues with interactive web visualizations, and 4) offering training programs for our peers in scientific computing. These components collectively improve the efficiency, transparency, and reproducibility of USGS data analyses and scientific workflows.
A vectorized Lanczos eigensolver for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1990-01-01
The computational strategies used to implement a Lanczos-based-method eigensolver on the latest generation of supercomputers are described. Several examples of structural vibration and buckling problems are presented that show the effects of using optimization techniques to increase the vectorization of the computational steps. The data storage and access schemes and the tools and strategies that best exploit the computer resources are presented. The method is implemented on the Convex C220, the Cray 2, and the Cray Y-MP computers. Results show that very good computation rates are achieved for the most computationally intensive steps of the Lanczos algorithm and that the Lanczos algorithm is many times faster than other methods extensively used in the past.
Aerial radiometric and magnetic survey: Aztec National Topographic Map, New Mexico
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1979-01-01
The results of analyses of the airborne gamma radiation and total magnetic field survey flown for the region identified as the Aztec National Topographic Map NJ13-10 are presented. The airborne data gathered are reduced by ground computer facilities to yield profile plots of the basic uranium, thorium and potassium equivalent gamma radiation intensities, ratios of these intensities, aircraft altitude above the earth's surface, total gamma ray and earth's magnetic field intensity, correlated as a function of geologic units. The distribution of data within each geologic unit, for all surveyed map lines and tie lines, has been calculated and is included.more » Two sets of profiled data for each line are included, with one set displaying the above-cited data. The second set includes only flight line magnetic field, temperature, pressure, altitude data plus magnetic field data as measured at a base station. A general description of the area, including descriptions of the various geologic units and the corresponding airborne data, is included also.« less
Aerial radiometric and magnetic survey: Lander National Topographic Map, Wyoming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1979-01-01
The results of analyses of the airborne gamma radiation and total magnetic field survey flown for the region identified as the Lander National Topographic Map NK12-6 are presented. The airborne data gathered are reduced by ground computer facilities to yield profile plots of the basic uranium, thorium and potassium equivalent gamma radiation intensities, ratios of these intensities, aircraft altitude above the earth's surface, total gamma ray and earth's magnetic field intensity, correlated as a function of geologic units. The distribution of data within each geologic unit, for all surveyed map lines and tie lines, has been calculated and is included.more » Two sets of profiled data for each line are included, with one set displaying the above-cited data. The second set includes only flight line magnetic field, temperature, pressure, altitude data plus magnetic field data as measured at a base station. A general description of the area, including descriptions of the various geologic units and the corresponding airborne data, is included also.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkata, Manjunath Gorentla; Aderholdt, William F
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
Spectral mapping of soil organic matter
NASA Technical Reports Server (NTRS)
Kristof, S. J.; Baumgardner, M. F.; Johannsen, C. J.
1974-01-01
Multispectral remote sensing data were examined for use in the mapping of soil organic matter content. Computer-implemented pattern recognition techniques were used to analyze data collected in May 1969 and May 1970 by an airborne multispectral scanner over a 40-km flightline. Two fields within the flightline were selected for intensive study. Approximately 400 surface soil samples from these fields were obtained for organic matter analysis. The analytical data were used as training sets for computer-implemented analysis of the spectral data. It was found that within the geographical limitations included in this study, multispectral data and automatic data processing techniques could be used very effectively to delineate and map surface soils areas containing different levels of soil organic matter.
Yang, Chaowei; Wu, Huayi; Huang, Qunying; Li, Zhenlong; Li, Jing
2011-01-01
Contemporary physical science studies rely on the effective analyses of geographically dispersed spatial data and simulations of physical phenomena. Single computers and generic high-end computing are not sufficient to process the data for complex physical science analysis and simulations, which can be successfully supported only through distributed computing, best optimized through the application of spatial principles. Spatial computing, the computing aspect of a spatial cyberinfrastructure, refers to a computing paradigm that utilizes spatial principles to optimize distributed computers to catalyze advancements in the physical sciences. Spatial principles govern the interactions between scientific parameters across space and time by providing the spatial connections and constraints to drive the progression of the phenomena. Therefore, spatial computing studies could better position us to leverage spatial principles in simulating physical phenomena and, by extension, advance the physical sciences. Using geospatial science as an example, this paper illustrates through three research examples how spatial computing could (i) enable data intensive science with efficient data/services search, access, and utilization, (ii) facilitate physical science studies with enabling high-performance computing capabilities, and (iii) empower scientists with multidimensional visualization tools to understand observations and simulations. The research examples demonstrate that spatial computing is of critical importance to design computing methods to catalyze physical science studies with better data access, phenomena simulation, and analytical visualization. We envision that spatial computing will become a core technology that drives fundamental physical science advancements in the 21st century. PMID:21444779
Yang, Chaowei; Wu, Huayi; Huang, Qunying; Li, Zhenlong; Li, Jing
2011-04-05
Contemporary physical science studies rely on the effective analyses of geographically dispersed spatial data and simulations of physical phenomena. Single computers and generic high-end computing are not sufficient to process the data for complex physical science analysis and simulations, which can be successfully supported only through distributed computing, best optimized through the application of spatial principles. Spatial computing, the computing aspect of a spatial cyberinfrastructure, refers to a computing paradigm that utilizes spatial principles to optimize distributed computers to catalyze advancements in the physical sciences. Spatial principles govern the interactions between scientific parameters across space and time by providing the spatial connections and constraints to drive the progression of the phenomena. Therefore, spatial computing studies could better position us to leverage spatial principles in simulating physical phenomena and, by extension, advance the physical sciences. Using geospatial science as an example, this paper illustrates through three research examples how spatial computing could (i) enable data intensive science with efficient data/services search, access, and utilization, (ii) facilitate physical science studies with enabling high-performance computing capabilities, and (iii) empower scientists with multidimensional visualization tools to understand observations and simulations. The research examples demonstrate that spatial computing is of critical importance to design computing methods to catalyze physical science studies with better data access, phenomena simulation, and analytical visualization. We envision that spatial computing will become a core technology that drives fundamental physical science advancements in the 21st century.
Machine learning: Trends, perspectives, and prospects.
Jordan, M I; Mitchell, T M
2015-07-17
Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today's most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing. Copyright © 2015, American Association for the Advancement of Science.
Using the FORTH Language to Develop an ICU Data Acquisition System
Goldberg, Arthur; SooHoo, Spencer L.; Koerner, Spencer K.; Chang, Robert S. Y.
1980-01-01
This paper describes a powerful programming tool that should be considered as an alternative to the more conventional programming languages now in use for developing medical computer systems. Forth provides instantaneous response to user commands, rapid program execution and tremendous programming versatility. An operating system and a language in one carefully designed unit, Forth is well suited for developing data acquisition systems and for interfacing computers to other instruments. We present some of the general features of Forth and describe its use in implementing a data collection system for a Respiratory Intensive Care Unit (RICU).
A novel approach to multiple sequence alignment using hadoop data grids.
Sudha Sadasivam, G; Baktavatchalam, G
2010-01-01
Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.
The dipole moment surface for hydrogen sulfide H2S
NASA Astrophysics Data System (ADS)
Azzam, Ala`a. A. A.; Lodi, Lorenzo; Yurchenko, Sergey N.; Tennyson, Jonathan
2015-08-01
In this work we perform a systematic ab initio study of the dipole moment surface (DMS) of H2S at various levels of theory and of its effect on the intensities of vibration-rotation transitions; H2S intensities are known from the experiment to display anomalies which have so far been difficult to reproduce by theoretical calculations. We use the transition intensities from the HITRAN database of 14 vibrational bands for our comparisons. The intensities of all fundamental bands show strong sensitivity to the ab initio method used for constructing the DMS while hot, overtone and combination bands up to 4000 cm-1 do not. The core-correlation and relativistic effects are found to be important for computed line intensities, for instance affecting the most intense fundamental band (ν2) by about 20%. Our recommended DMS, called ALYT2, is based on the CCSD(T)/aug-cc-pV(6+d)Z level of theory supplemented by a core-correlation/relativistic corrective surface obtained at the CCSD[T]/aug-cc-pCV5Z-DK level. The corresponding computed intensities agree significantly better (to within 10%) with experimental data taken directly from original papers. Worse agreement (differences of about 25%) is found for those HITRAN intensities obtained from fitted effective dipole models, suggesting the presence of underlying problems in those fits.
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
Ultraviolet continuum absorption /less than about 1000 A/ above the quiet sun transition region
NASA Technical Reports Server (NTRS)
Doschek, G. A.; Feldman, U.
1982-01-01
Lyman continuum absorption shortward of 912 A in the quiet sun solar transition region is investigated by combining spectra obtained from the Apollo Telescope Mount experiments on Skylab. The most recent atomic data are used to compute line intensities for lines that fall on both sides of the Lyman limit. Lines of O III, O IV, O V, and S IV are considered. The computed intensity ratios of most lines from O IV, O V, and S IV agree with the experimental ratios to within a factor of 2. However, the discrepancies show no apparent wavelength dependence. From this fact, it is concluded that at least part of the discrepancy between theory and observation for lines of these ions can be accounted for by uncertainties in instrumental calibration and atomic data. However, difficulties remain in reconciling observation and theory, particularly for lines of O III, and one line of S IV. The other recent results of Schmahl and Orrall (1979) are also discussed in terms of newer atomic data.
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan
2014-01-01
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432
Computation of acoustic ressure fields produced in feline brain by high-intensity focused ultrasound
NASA Astrophysics Data System (ADS)
Omidi, Nazanin
In 1975, Dunn et al. (JASA 58:512-514) showed that a simple relation describes the ultrasonic threshold for cavitation-induced changes in the mammalian brain. The thresholds for tissue damage were estimated for a variety of acoustic parameters in exposed feline brain. The goal of this study was to improve the estimates for acoustic pressures and intensities present in vivo during those experimental exposures by estimating them using nonlinear rather than linear theory. In our current project, the acoustic pressure waveforms produced in the brains of anesthetized felines were numerically simulated for a spherically focused, nominally f1-transducer (focal length = 13 cm) at increasing values of the source pressure at frequencies of 1, 3, and 9 MHz. The corresponding focal intensities were correlated with the experimental data of Dunn et al. The focal pressure waveforms were also computed at the location of the true maximum. For low source pressures, the computed waveforms were the same as those determined using linear theory, and the focal intensities matched experimentally determined values. For higher source pressures, the focal pressure waveforms became increasingly distorted, with the compressional amplitude of the wave becoming greater, and the rarefactional amplitude becoming lower than the values calculated using linear theory. The implications of these results for clinical exposures are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De, K; Jha, S; Klimentov, A
2016-01-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full production for the ATLAS experiment since September 2015. We will present our current accomplishments with running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
SAVLOC, computer program for automatic control and analysis of X-ray fluorescence experiments
NASA Technical Reports Server (NTRS)
Leonard, R. F.
1977-01-01
A program for a PDP-15 computer is presented which provides for control and analysis of trace element determinations by using X-ray fluorescence. The program simultaneously handles data accumulation for one sample and analysis of data from previous samples. Data accumulation consists of sample changing, timing, and data storage. Analysis requires the locating of peaks in X-ray spectra, determination of intensities of peaks, identification of origins of peaks, and determination of a real density of the element responsible for each peak. The program may be run in either a manual (supervised) mode or an automatic (unsupervised) mode.
Modeling the Proton Radiation Belt With Van Allen Probes Relativistic Electron-Proton Telescope Data
NASA Technical Reports Server (NTRS)
Kanekal, S. G.; Li, X.; Baker, D. N.; Selesnick, R. S.; Hoxie, V. C.
2018-01-01
An empirical model of the proton radiation belt is constructed from data taken during 2013-2017 by the Relativistic Electron-Proton Telescopes on the Van Allen Probes satellites. The model intensity is a function of time, kinetic energy in the range 18-600 megaelectronvolts, equatorial pitch angle, and L shell of proton guiding centers. Data are selected, on the basis of energy deposits in each of the nine silicon detectors, to reduce background caused by hard proton energy spectra at low L. Instrument response functions are computed by Monte Carlo integration, using simulated proton paths through a simplified structural model, to account for energy loss in shielding material for protons outside the nominal field of view. Overlap of energy channels, their wide angular response, and changing satellite orientation require the model dependencies on all three independent variables be determined simultaneously. This is done by least squares minimization with a customized steepest descent algorithm. Model uncertainty accounts for statistical data error and systematic error in the simulated instrument response. A proton energy spectrum is also computed from data taken during the 8 January 2014 solar event, to illustrate methods for the simpler case of an isotropic and homogeneous model distribution. Radiation belt and solar proton results are compared to intensities computed with a simplified, on-axis response that can provide a good approximation under limited circumstances.
Hybrid cloud and cluster computing paradigms for life science applications
2010-01-01
Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Modeling the Proton Radiation Belt With Van Allen Probes Relativistic Electron-Proton Telescope Data
NASA Astrophysics Data System (ADS)
Selesnick, R. S.; Baker, D. N.; Kanekal, S. G.; Hoxie, V. C.; Li, X.
2018-01-01
An empirical model of the proton radiation belt is constructed from data taken during 2013-2017 by the Relativistic Electron-Proton Telescopes on the Van Allen Probes satellites. The model intensity is a function of time, kinetic energy in the range 18-600 MeV, equatorial pitch angle, and L shell of proton guiding centers. Data are selected, on the basis of energy deposits in each of the nine silicon detectors, to reduce background caused by hard proton energy spectra at low L. Instrument response functions are computed by Monte Carlo integration, using simulated proton paths through a simplified structural model, to account for energy loss in shielding material for protons outside the nominal field of view. Overlap of energy channels, their wide angular response, and changing satellite orientation require the model dependencies on all three independent variables be determined simultaneously. This is done by least squares minimization with a customized steepest descent algorithm. Model uncertainty accounts for statistical data error and systematic error in the simulated instrument response. A proton energy spectrum is also computed from data taken during the 8 January 2014 solar event, to illustrate methods for the simpler case of an isotropic and homogeneous model distribution. Radiation belt and solar proton results are compared to intensities computed with a simplified, on-axis response that can provide a good approximation under limited circumstances.
Hybrid cloud and cluster computing paradigms for life science applications.
Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey
2010-12-21
Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Simulation Based Exploration of Critical Zone Dynamics in Intensively Managed Landscapes
NASA Astrophysics Data System (ADS)
Kumar, P.
2017-12-01
The advent of high-resolution measurements of topographic and (vertical) vegetation features using areal LiDAR are enabling us to resolve micro-scale ( 1m) landscape structural characteristics over large areas. Availability of hyperspectral measurements is further augmenting these LiDAR data by enabling the biogeochemical characterization of vegetation and soils at unprecedented spatial resolutions ( 1-10m). Such data have opened up novel opportunities for modeling Critical Zone processes and exploring questions that were not possible before. We show how an integrated 3-D model at 1m grid resolution can enable us to resolve micro-topographic and ecological dynamics and their control on hydrologic and biogeochemical processes over large areas. We address the computational challenge of such detailed modeling by exploiting hybrid CPU and GPU computing technologies. We show results of moisture, biogeochemical, and vegetation dynamics from studies in the Critical Zone Observatory for Intensively managed Landscapes (IMLCZO) in the Midwestern United States.
A world-wide databridge supported by a commercial cloud provider
NASA Astrophysics Data System (ADS)
Tat Cheung, Kwong; Field, Laurence; Furano, Fabrizio
2017-10-01
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources. However, high energy physics applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects. While the so-called databridge has already been successfully proposed as a method to span the untrusted and trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing residential networks and CERN could be unreliable, leading to wasted resources usage. The expectation is that by placing a storage endpoint that is part of a wider, flexible geographical databridge deployment closer to the volunteers, the transfer success rate and the overall performance can be improved. This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.
NASA Astrophysics Data System (ADS)
Englander, J. G.; Austin, A. T.; Brandt, A. R.
2016-12-01
The need to quantify flaring by oil and gas fields is receiving more scrutiny, as there has been scientific and regulatory interest in quantifying the greenhouse gas (GHG) impact of oil and gas production. The National Oceanic and Atmospheric Administration (NOAA) has developed a method to track flaring activity using a Visible Infrared Imaging Radiometer Suite (VIIRS) satellite.[1] This reports data on the average size, power, and light intensity of each flare. However, outside of some small studies, the flaring intensity has generally been estimated at the country level.[2]While informative, country-level assessments cannot provide guidance about the sustainability of particular crude streams or products produced. In this work we generate detailed oil-field-level flaring intensities for a number of global oilfield operations. We do this by merging the VIIRS dataset with global oilfield atlases and other spatial data sources. Joining these datasets together with production data allows us to provide better estimates for the GHG intensity of flaring at the field level for these countries.[3]First, we compute flaring intensities at the field level for 75 global oil fields representing approximately 25% of global production. In addition, we examine in detail three oil producing countries known to have high rates of flaring: Egypt, Nigeria, and Venezuela. For these countries we compute the flaring rate for all fields in the country and explore within-and between-country variation. The countries' fields will be analyzed to determine the correlation of flare activity to a certain field type, crude type, region, or production method. [1] Cao, C. "Visible Infrared Imaging Radiometer Suite (VIIRS)." NOAA NPP VIIRS. NOAA, 2013. Web. 30 July 2016. [2] Elvidge, C. D. et al., "A Fifteen Year Record of Global Natural Gas Flaring Derived from Satellite Data," Energies, vol. 2, no. 3, pp. 595-622, Aug. 2009. [3] World Energy Atlas. 6th ed. London: Petroleum Economist, 2011. Print.
NASA Astrophysics Data System (ADS)
Stone, S.; Parker, M. S.; Howe, B.; Lazowska, E.
2015-12-01
Rapid advances in technology are transforming nearly every field from "data-poor" to "data-rich." The ability to extract knowledge from this abundance of data is the cornerstone of 21st century discovery. At the University of Washington eScience Institute, our mission is to engage researchers across disciplines in developing and applying advanced computational methods and tools to real world problems in data-intensive discovery. Our research team consists of individuals with diverse backgrounds in domain sciences such as astronomy, oceanography and geology, with complementary expertise in advanced statistical and computational techniques such as data management, visualization, and machine learning. Two key elements are necessary to foster careers in data science: individuals with cross-disciplinary training in both method and domain sciences, and career paths emphasizing alternative metrics for advancement. We see persistent and deep-rooted challenges for the career paths of people whose skills, activities and work patterns don't fit neatly into the traditional roles and success metrics of academia. To address these challenges the eScience Institute has developed training programs and established new career opportunities for data-intensive research in academia. Our graduate students and post-docs have mentors in both a methodology and an application field. They also participate in coursework and tutorials to advance technical skill and foster community. Professional Data Scientist positions were created to support research independence while encouraging the development and adoption of domain-specific tools and techniques. The eScience Institute also supports the appointment of faculty who are innovators in developing and applying data science methodologies to advance their field of discovery. Our ultimate goal is to create a supportive environment for data science in academia and to establish global recognition for data-intensive discovery across all fields.
CIM for 300-mm semiconductor fab
NASA Astrophysics Data System (ADS)
Luk, Arthur
1997-08-01
Five years ago, factory automation (F/A) was not prevalent in the fab. Today facing the drastically changed market and the intense competition, management request the plant floor data be forward to their desktop computer. This increased demand rapidly pushed F/A to the computer integrated manufacturing (CIM). Through personalization, we successfully reduced a computer size, let them can be stored on our desktop. PC initiates a computer new era. With the advent of the network, the network computer (NC) creates fresh problems for us. When we plan to invest more than $3 billion to build new 300 mm fab, the next generation technology raises a challenging bar.
Fatigue Crack Growth Rate and Stress-Intensity Factor Corrections for Out-of-Plane Crack Growth
NASA Technical Reports Server (NTRS)
Forth, Scott C.; Herman, Dave J.; James, Mark A.
2003-01-01
Fatigue crack growth rate testing is performed by automated data collection systems that assume straight crack growth in the plane of symmetry and use standard polynomial solutions to compute crack length and stress-intensity factors from compliance or potential drop measurements. Visual measurements used to correct the collected data typically include only the horizontal crack length, which for cracks that propagate out-of-plane, under-estimates the crack growth rates and over-estimates the stress-intensity factors. The authors have devised an approach for correcting both the crack growth rates and stress-intensity factors based on two-dimensional mixed mode-I/II finite element analysis (FEA). The approach is used to correct out-of-plane data for 7050-T7451 and 2025-T6 aluminum alloys. Results indicate the correction process works well for high DeltaK levels but fails to capture the mixed-mode effects at DeltaK levels approaching threshold (da/dN approximately 10(exp -10) meter/cycle).
Big Data Ecosystems Enable Scientific Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Critchlow, Terence J.; Kleese van Dam, Kerstin
Over the past 5 years, advances in experimental, sensor and computational technologies have driven the exponential growth in the volumes, acquisition rates, variety and complexity of scientific data. As noted by Hey et al in their 2009 e-book The Fourth Paradigm, this availability of large-quantities of scientifically meaningful data has given rise to a new scientific methodology - data intensive science. Data intensive science is the ability to formulate and evaluate hypotheses using data and analysis to extend, complement and, at times, replace experimentation, theory, or simulation. This new approach to science no longer requires scientists to interact directly withmore » the objects of their research; instead they can utilize digitally captured, reduced, calibrated, analyzed, synthesized and visualized results - allowing them carry out 'experiments' in data.« less
Thermal-neutron capture gamma-rays. Volume 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuli, J.K.
1997-05-01
The energy and photon intensity of gamma rays as seen in thermal-neutron capture are presented ordered by Z, A of target nuclei. All gamma-rays with intensity of {ge}2% of the strongest transition are included. The strongest transition is indicated in each case. Where the target nuclide mass number is indicated as nat the natural target was used. The gamma energies given are in keV. The gamma intensities given are relative to 100 for the strongest transition. All data for A > 44 are taken from Evaluated Nuclear Structure Data File (4/97), a computer file of evaluated nuclear structure data maintainedmore » by the National Nuclear Data Center, Brookhaven National Laboratory, on behalf of the Nuclear Structure and Decay and Decay Data network, coordinated by the International Atomic Energy Agency, Vienna. These data are published in Nuclear Data Sheets, Academic Press, San Diego, CA. The data for A {le} 44 is taken from ``Prompt Gamma Rays from Thermal-Neutron Capture,`` M.A. Lone, R.A. Leavitt, D.A. Harrison, Atomic Data and Nuclear Data Tables 26, 511 (1981).« less
Not Scotch, but Rum: The Scope and Diffusion of the Scottish Presence in the Published Record
ERIC Educational Resources Information Center
Lavoie, Brian
2013-01-01
Big data sets and powerful computing capacity have transformed scholarly inquiry across many disciplines. While the impact of data-intensive research methodologies is perhaps most distinct in the natural and social sciences, the humanities have also benefited from these new analytical tools. While full-text data is necessary to study topics such…
Free-field propagation of high intensity noise
NASA Technical Reports Server (NTRS)
Welz, Joseph P.; Mcdaniel, Oliver H.
1990-01-01
Observed spectral data from supersonic jet aircraft are known to contain much more high frequency energy than can be explained by linear acoustic propagation theory. It is believed that the high frequency energy is an effect of nonlinear distortion due to the extremely high acoustic levels generated by the jet engines. The objective, to measure acoustic waveform distortion for spherically diverging high intensity noise, was reached by using an electropneumatic acoustic source capable of generating sound pressure levels in the range of 140 to 160 decibels (re 20 micro Pa). The noise spectrum was shaped to represent the spectra generated by jet engines. Two microphones were used to capture the acoustic pressure waveform at different points along the propagation path in order to provide a direct measure of the waveform distortion as well as spectral distortion. A secondary objective was to determine that the observed distortion is an acoustic effect. To do this an existing computer prediction code that deals with nonlinear acoustic propagation was used on data representative of the measured data. The results clearly demonstrate that high intensity jet noise does shift the energy in the spectrum to the higher frequencies along the propagation path. In addition, the data from the computer model are in good agreement with the measurements, thus demonstrating that the waveform distortion can be accounted for with nonlinear acoustic theory.
WPS mediation: An approach to process geospatial data on different computing backends
NASA Astrophysics Data System (ADS)
Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas
2012-10-01
The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem
NASA Astrophysics Data System (ADS)
Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.
2015-12-01
Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Streaming Support for Data Intensive Cloud-Based Sequence Analysis
Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
Technologies for Large Data Management in Scientific Computing
NASA Astrophysics Data System (ADS)
Pace, Alberto
2014-01-01
In recent years, intense usage of computing has been the main strategy of investigations in several scientific research projects. The progress in computing technology has opened unprecedented opportunities for systematic collection of experimental data and the associated analysis that were considered impossible only few years ago. This paper focuses on the strategies in use: it reviews the various components that are necessary for an effective solution that ensures the storage, the long term preservation, and the worldwide distribution of large quantities of data that are necessary in a large scientific research project. The paper also mentions several examples of data management solutions used in High Energy Physics for the CERN Large Hadron Collider (LHC) experiments in Geneva, Switzerland which generate more than 30,000 terabytes of data every year that need to be preserved, analyzed, and made available to a community of several tenth of thousands scientists worldwide.
Real-time data-intensive computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parkinson, Dilworth Y., E-mail: dyparkinson@lbl.gov; Chen, Xian; Hexemer, Alexander
2016-07-27
Today users visit synchrotrons as sources of understanding and discovery—not as sources of just light, and not as sources of data. To achieve this, the synchrotron facilities frequently provide not just light but often the entire end station and increasingly, advanced computational facilities that can reduce terabytes of data into a form that can reveal a new key insight. The Advanced Light Source (ALS) has partnered with high performance computing, fast networking, and applied mathematics groups to create a “super-facility”, giving users simultaneous access to the experimental, computational, and algorithmic resources to make this possible. This combination forms an efficientmore » closed loop, where data—despite its high rate and volume—is transferred and processed immediately and automatically on appropriate computing resources, and results are extracted, visualized, and presented to users or to the experimental control system, both to provide immediate insight and to guide decisions about subsequent experiments during beamtime. We will describe our work at the ALS ptychography, scattering, micro-diffraction, and micro-tomography beamlines.« less
NASA Astrophysics Data System (ADS)
Evseev, D. G.; Savrukhin, A. V.; Neklyudov, A. N.
2018-01-01
Computer simulation of the kinetics of thermal processes and structural and phase transformations in the wall of a bogie side frame produced from steel 20GL is performed with allowance for the differences in the cooling intensity under volume-surface hardening. The simulation is based on the developed method employing the diagram of decomposition of austenite at different cooling rates. The data obtained are used to make conclusion on the effect of the cooling intensity on propagation of martensite structure over the wall section.
High-intensity positron microprobe at Jefferson Lab
Golge, Serkan; Vlahovic, Branislav; Wojtsekhowski, Bogdan B.
2014-06-19
We present a conceptual design for a novel continuous wave electron-linac based high-intensity slow-positron production source with a projected intensity on the order of 10 10 e +/s. Reaching this intensity in our design relies on the transport of positrons (T + below 600 keV) from the electron-positron pair production converter target to a low-radiation and low-temperature area for moderation in a high-efficiency cryogenic rare gas moderator, solid Ne. The performance of the integrated beamline has been verified through computational studies. The computational results include Monte Carlo calculations of the optimized electron/positron beam energies, converter target thickness, synchronized raster system,more » transport of the beam from the converter target to the moderator, extraction of the beam from the channel, and moderation efficiency calculations. For the extraction of positrons from the magnetic channel a magnetic field terminator plug prototype has been built and experimental data on the effectiveness of this prototype are presented. The dissipation of the heat away from the converter target and radiation protection measures are also discussed.« less
Technical Note: scuda: A software platform for cumulative dose assessment.
Park, Seyoun; McNutt, Todd; Plishker, William; Quon, Harry; Wong, John; Shekhar, Raj; Lee, Junghoon
2016-10-01
Accurate tracking of anatomical changes and computation of actually delivered dose to the patient are critical for successful adaptive radiation therapy (ART). Additionally, efficient data management and fast processing are practically important for the adoption in clinic as ART involves a large amount of image and treatment data. The purpose of this study was to develop an accurate and efficient Software platform for CUmulative Dose Assessment (scuda) that can be seamlessly integrated into the clinical workflow. scuda consists of deformable image registration (DIR), segmentation, dose computation modules, and a graphical user interface. It is connected to our image PACS and radiotherapy informatics databases from which it automatically queries/retrieves patient images, radiotherapy plan, beam data, and daily treatment information, thus providing an efficient and unified workflow. For accurate registration of the planning CT and daily CBCTs, the authors iteratively correct CBCT intensities by matching local intensity histograms during the DIR process. Contours of the target tumor and critical structures are then propagated from the planning CT to daily CBCTs using the computed deformations. The actual delivered daily dose is computed using the registered CT and patient setup information by a superposition/convolution algorithm, and accumulated using the computed deformation fields. Both DIR and dose computation modules are accelerated by a graphics processing unit. The cumulative dose computation process has been validated on 30 head and neck (HN) cancer cases, showing 3.5 ± 5.0 Gy (mean±STD) absolute mean dose differences between the planned and the actually delivered doses in the parotid glands. On average, DIR, dose computation, and segmentation take 20 s/fraction and 17 min for a 35-fraction treatment including additional computation for dose accumulation. The authors developed a unified software platform that provides accurate and efficient monitoring of anatomical changes and computation of actually delivered dose to the patient, thus realizing an efficient cumulative dose computation workflow. Evaluation on HN cases demonstrated the utility of our platform for monitoring the treatment quality and detecting significant dosimetric variations that are keys to successful ART.
Technical Note: SCUDA: A software platform for cumulative dose assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Seyoun; McNutt, Todd; Quon, Harry
Purpose: Accurate tracking of anatomical changes and computation of actually delivered dose to the patient are critical for successful adaptive radiation therapy (ART). Additionally, efficient data management and fast processing are practically important for the adoption in clinic as ART involves a large amount of image and treatment data. The purpose of this study was to develop an accurate and efficient Software platform for CUmulative Dose Assessment (SCUDA) that can be seamlessly integrated into the clinical workflow. Methods: SCUDA consists of deformable image registration (DIR), segmentation, dose computation modules, and a graphical user interface. It is connected to our imagemore » PACS and radiotherapy informatics databases from which it automatically queries/retrieves patient images, radiotherapy plan, beam data, and daily treatment information, thus providing an efficient and unified workflow. For accurate registration of the planning CT and daily CBCTs, the authors iteratively correct CBCT intensities by matching local intensity histograms during the DIR process. Contours of the target tumor and critical structures are then propagated from the planning CT to daily CBCTs using the computed deformations. The actual delivered daily dose is computed using the registered CT and patient setup information by a superposition/convolution algorithm, and accumulated using the computed deformation fields. Both DIR and dose computation modules are accelerated by a graphics processing unit. Results: The cumulative dose computation process has been validated on 30 head and neck (HN) cancer cases, showing 3.5 ± 5.0 Gy (mean±STD) absolute mean dose differences between the planned and the actually delivered doses in the parotid glands. On average, DIR, dose computation, and segmentation take 20 s/fraction and 17 min for a 35-fraction treatment including additional computation for dose accumulation. Conclusions: The authors developed a unified software platform that provides accurate and efficient monitoring of anatomical changes and computation of actually delivered dose to the patient, thus realizing an efficient cumulative dose computation workflow. Evaluation on HN cases demonstrated the utility of our platform for monitoring the treatment quality and detecting significant dosimetric variations that are keys to successful ART.« less
NASA Astrophysics Data System (ADS)
Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.
2016-10-01
The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
Randomization Procedures Applied to Analysis of Ballistic Data
1991-06-01
test,;;15. NUMBER OF PAGES data analysis; computationally intensive statistics ; randomization tests; permutation tests; 16 nonparametric statistics ...be 0.13. 8 Any reasonable statistical procedure would fail to support the notion of improvement of dynamic over standard indexing based on this data ...AD-A238 389 TECHNICAL REPORT BRL-TR-3245 iBRL RANDOMIZATION PROCEDURES APPLIED TO ANALYSIS OF BALLISTIC DATA MALCOLM S. TAYLOR BARRY A. BODT - JUNE
Dakua, Sarada Prasad; Abinahed, Julien; Al-Ansari, Abdulla
2015-01-01
Abstract. Liver segmentation continues to remain a major challenge, largely due to its intense complexity with surrounding anatomical structures (stomach, kidney, and heart), high noise level and lack of contrast in pathological computed tomography (CT) data. We present an approach to reconstructing the liver surface in low contrast CT. The main contributions are: (1) a stochastic resonance-based methodology in discrete cosine transform domain is developed to enhance the contrast of pathological liver images, (2) a new formulation is proposed to prevent the object boundary, resulting from the cellular automata method, from leaking into the surrounding areas of similar intensity, and (3) a level-set method is suggested to generate intermediate segmentation contours from two segmented slices distantly located in a subject sequence. We have tested the algorithm on real datasets obtained from two sources, Hamad General Hospital and medical image computing and computer-assisted interventions grand challenge workshop. Various parameters in the algorithm, such as w, Δt, z, α, μ, α1, and α2, play imperative roles, thus their values are precisely selected. Both qualitative and quantitative evaluation performed on liver data show promising segmentation accuracy when compared with ground truth data reflecting the potential of the proposed method. PMID:26158101
Hadoop for High-Performance Climate Analytics: Use Cases and Lessons Learned
NASA Technical Reports Server (NTRS)
Tamkin, Glenn
2013-01-01
Scientific data services are a critical aspect of the NASA Center for Climate Simulations mission (NCCS). Hadoop, via MapReduce, provides an approach to high-performance analytics that is proving to be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. The NCCS is particularly interested in the potential of Hadoop to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we prototyped a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. The initial focus was on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. After preliminary results suggested that this approach improves efficiencies within data intensive analytic workflows, we invested in building a cyber infrastructure resource for developing a new generation of climate data analysis capabilities using Hadoop. This resource is focused on reducing the time spent in the preparation of reanalysis data used in data-model inter-comparison, a long sought goal of the climate community. This paper summarizes the related use cases and lessons learned.
NASA Astrophysics Data System (ADS)
Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley
2015-04-01
The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.
BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.
Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong
2013-12-01
The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information. PMID:22859646
Computational provenance in hydrologic science: a snow mapping example.
Dozier, Jeff; Frew, James
2009-03-13
Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estimate the fraction of each 500 m pixel that snow covers. The daily products have data gaps and errors because of cloud cover and sensor viewing geometry, so we interpolate and smooth to produce our best estimate of the daily snow cover. To manage the data, we have developed the Earth System Science Server (ES3), a software environment for data-intensive Earth science, with unique capabilities for automatically and transparently capturing and managing the provenance of arbitrary computations. Transparent acquisition avoids the scientists having to express their computations in specific languages or schemas in order for provenance to be acquired and maintained. ES3 models provenance as relationships between processes and their input and output files. It is particularly suited to capturing the provenance of an evolving algorithm whose components span multiple languages and execution environments.
Building Research Cyberinfrastructure at Small/Medium Research Institutions
ERIC Educational Resources Information Center
Agee, Anne; Rowe, Theresa; Woo, Melissa; Woods, David
2010-01-01
A 2006 ECAR study defined cyberinfrastructure as the coordinated aggregate of "hardware, software, communications, services, facilities, and personnel that enable researchers to conduct advanced computational, collaborative, and data-intensive research." While cyberinfrastructure was initially seen as support for scientific and…
Computational chemistry and aeroassisted orbital transfer vehicles
NASA Technical Reports Server (NTRS)
Cooper, D. M.; Jaffe, R. L.; Arnold, J. O.
1985-01-01
An analysis of the radiative heating phenomena encountered during a typical aeroassisted orbital transfer vehicle (AOTV) trajectory was made to determine the potential impact of computational chemistry on AOTV design technology. Both equilibrium and nonequilibrium radiation mechanisms were considered. This analysis showed that computational chemistry can be used to predict (1) radiative intensity factors and spectroscopic data; (2) the excitation rates of both atoms and molecules; (3) high-temperature reaction rate constants for metathesis and charge exchange reactions; (4) particle ionization and neutralization rates and cross sections; and (5) spectral line widths.
LANDSAT-1 data, its use in a soil survey program
NASA Technical Reports Server (NTRS)
Westin, F. C.; Frazee, C. J.
1975-01-01
The following applications of LANDSAT imagery were investigated: assistance in recognizing soil survey boundaries, low intensity soil surveys, and preparation of a base map for publishing thematic soils maps. The following characteristics of LANDSAT imagery were tested as they apply to the recognition of soil boundaries in South Dakota and western Minnesota: synoptic views due to the large areas covered, near-orthography and lack of distortion, flexibility of selecting the proper season, data recording in four parts of the spectrum, and the use of computer compatible tapes. A low intensity soil survey of Pennington County, South Dakota was completed in 1974. Low intensity inexpensive soil surveys can provide the data needed to evaluate agricultural land for the remaining counties until detailed soil surveys are completed. In using LANDSAT imagery as a base map for publishing thematic soil maps, the first step was to prepare a mosaic with 20 LANDSAT scenes from several late spring passes in 1973.
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update
Afgan, Enis; Baker, Dannon; van den Beek, Marius; Blankenberg, Daniel; Bouvier, Dave; Čech, Martin; Chilton, John; Clements, Dave; Coraor, Nate; Eberhard, Carl; Grüning, Björn; Guerler, Aysam; Hillman-Jackson, Jennifer; Von Kuster, Greg; Rasche, Eric; Soranzo, Nicola; Turaga, Nitesh; Taylor, James; Nekrutenko, Anton; Goecks, Jeremy
2016-01-01
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale. PMID:27137889
Open source data logger for low-cost environmental monitoring
2014-01-01
Abstract The increasing transformation of biodiversity into a data-intensive science has seen numerous independent systems linked and aggregated into the current landscape of biodiversity informatics. This paper outlines how we can move forward with this programme, incorporating real time environmental monitoring into our methodology using low-power and low-cost computing platforms. PMID:24855446
The evolution of voids in the adhesion approximation
NASA Astrophysics Data System (ADS)
Sahni, Varun; Sathyaprakah, B. S.; Shandarin, Sergei F.
1994-08-01
We apply the adhesion approximation to study the formation and evolution of voids in the universe. Our simulations-carried out using 1283 particles in a cubical box with side 128 Mpc-indicate that the void spectrum evolves with time and that the mean void size in the standard Cosmic Background Explorer Satellite (COBE)-normalized cold dark matter (CDM) model with H50 = 1 scals approximately as bar D(z) = bar Dzero/(1+2)1/2, where bar Dzero approximately = 10.5 Mpc. Interestingly, we find a strong correlation between the sizes of voids and the value of the primordial gravitational potential at void centers. This observation could in principle, pave the way toward reconstructing the form of the primordial potential from a knowledge of the observed void spectrum. Studying the void spectrum at different cosmological epochs, for spectra with a built in k-space cutoff we find that the number of voids in a representative volume evolves with time. The mean number of voids first increases until a maximum value is reached (indicating that the formation of cellular structure is complete), and then begins to decrease as clumps and filaments erge leading to hierarchical clustering and the subsequent elimination of small voids. The cosmological epoch characterizing the completion of cellular structure occurs when the length scale going nonlinear approaches the mean distance between peaks of the gravitaional potential. A central result of this paper is that voids can be populated by substructure such as mini-sheets and filaments, which run through voids. The number of such mini-pancakes that pass through a given void can be measured by the genus characteristic of an individual void which is an indicator of the topology of a given void in intial (Lagrangian) space. Large voids have on an average a larger measure than smaller voids indicating more substructure within larger voids relative to smaller ones. We find that the topology of individual voids is strongly epoch dependent, with void topologies generally simplifying with time. This means that as voids grow older they become progressively more empty and have less structure within them. We evaluate the genus measure both for individual voids as well as for the entire ensemble of voids predicted by CDM model. As a result we find that the topology of voids when taken together with the void spectrum is a very useful statistical indicator of the evolution of the structure of the universe on large scales.
Room temperature linelists for CO2 asymmetric isotopologues with ab initio computed intensities
NASA Astrophysics Data System (ADS)
Zak, Emil J.; Tennyson, Jonathan; Polyansky, Oleg L.; Lodi, Lorenzo; Zobov, Nikolay F.; Tashkun, Sergei A.; Perevalov, Valery I.
2017-12-01
The present paper reports room temperature line lists for six asymmetric isotopologues of carbon dioxide: 16O12C18O (628), 16O12C17O (627), 16O13C18O (638),16O13C17O (637), 17O12C18O (728) and 17O13C18O (738), covering the range 0-8000 cm-1. Variational rotation-vibration wavefunctions and energy levels are computed using the DVR3D software suite and a high quality semi-empirical potential energy surface (PES), followed by computation of intensities using an ab initio dipole moment surface (DMS). A theoretical procedure for quantifying sensitivity of line intensities to minor distortions of the PES/DMS renders our theoretical model as critically evaluated. Several recent high quality measurements and theoretical approaches are discussed to provide a benchmark of our results against the most accurate available data. Indeed, the thesis of transferability of accuracy among different isotopologues with the use of mass-independent PES is supported by several examples. Thereby, we conclude that the majority of line intensities for strong bands are predicted with sub-percent accuracy. Accurate line positions are generated using an effective Hamiltonian, constructed from the latest experiments. This study completes the list of relevant isotopologues of carbon dioxide; these line lists are available to remote sensing studies and inclusion in databases.
Advanced processing for high-bandwidth sensor systems
NASA Astrophysics Data System (ADS)
Szymanski, John J.; Blain, Phil C.; Bloch, Jeffrey J.; Brislawn, Christopher M.; Brumby, Steven P.; Cafferty, Maureen M.; Dunham, Mark E.; Frigo, Janette R.; Gokhale, Maya; Harvey, Neal R.; Kenyon, Garrett; Kim, Won-Ha; Layne, J.; Lavenier, Dominique D.; McCabe, Kevin P.; Mitchell, Melanie; Moore, Kurt R.; Perkins, Simon J.; Porter, Reid B.; Robinson, S.; Salazar, Alfonso; Theiler, James P.; Young, Aaron C.
2000-11-01
Compute performance and algorithm design are key problems of image processing and scientific computing in general. For example, imaging spectrometers are capable of producing data in hundreds of spectral bands with millions of pixels. These data sets show great promise for remote sensing applications, but require new and computationally intensive processing. The goal of the Deployable Adaptive Processing Systems (DAPS) project at Los Alamos National Laboratory is to develop advanced processing hardware and algorithms for high-bandwidth sensor applications. The project has produced electronics for processing multi- and hyper-spectral sensor data, as well as LIDAR data, while employing processing elements using a variety of technologies. The project team is currently working on reconfigurable computing technology and advanced feature extraction techniques, with an emphasis on their application to image and RF signal processing. This paper presents reconfigurable computing technology and advanced feature extraction algorithm work and their application to multi- and hyperspectral image processing. Related projects on genetic algorithms as applied to image processing will be introduced, as will the collaboration between the DAPS project and the DARPA Adaptive Computing Systems program. Further details are presented in other talks during this conference and in other conferences taking place during this symposium.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
DDDAMS-based Urban Surveillance and Crowd Control via UAVs and UGVs
2015-12-04
for crowd dynamics modeling by incorporating multi-resolution data, where a grid-based method is used to model crowd motion with UAVs’ low -resolution...information and more computational intensive (and time-consuming). Given that the deployment of fidelity selection results in simulation faces computational... low fidelity information FOV y (A) DR x (A) DR y (A) Not detected high fidelity information Table 1: Parameters for UAV and UGV for their detection
ERIC Educational Resources Information Center
Buche, Mari W.; Davis, Larry R.; Vician, Chelley
2007-01-01
Computers are pervasive in business and education, and it would be easy to assume that all individuals embrace technology. However, evidence shows that roughly 30 to 40 percent of individuals experience some level of computer anxiety. Many academic programs involve computing-intensive courses, but the actual effects of this exposure on computer…
Myers, Matthew R; Giridhar, Dushyanth
2011-06-01
In the characterization of high-intensity focused ultrasound (HIFU) systems, it is desirable to know the intensity field within a tissue phantom. Infrared (IR) thermography is a potentially useful method for inferring this intensity field from the heating pattern within the phantom. However, IR measurements require an air layer between the phantom and the camera, making inferences about the thermal field in the absence of the air complicated. For example, convection currents can arise in the air layer and distort the measurements relative to the phantom-only situation. Quantitative predictions of intensity fields based upon IR temperature data are also complicated by axial and radial diffusion of heat. In this paper, mathematical expressions are derived for use with IR temperature data acquired at times long enough that noise is a relatively small fraction of the temperature trace, but small enough that convection currents have not yet developed. The relations were applied to simulated IR data sets derived from computed pressure and temperature fields. The simulation was performed in a finite-element geometry involving a HIFU transducer sonicating upward in a phantom toward an air interface, with an IR camera mounted atop an air layer, looking down at the heated interface. It was found that, when compared to the intensity field determined directly from acoustic propagation simulations, intensity profiles could be obtained from the simulated IR temperature data with an accuracy of better than 10%, at pre-focal, focal, and post-focal locations. © 2011 Acoustical Society of America
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
NASA Astrophysics Data System (ADS)
Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.
2014-12-01
We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.
GPU and APU computations of Finite Time Lyapunov Exponent fields
NASA Astrophysics Data System (ADS)
Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
2012-03-01
We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.
Cloud Based Metalearning System for Predictive Modeling of Biomedical Data
Vukićević, Milan
2014-01-01
Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101
Nakajima, Nobuharu
2010-07-20
When a very intense beam is used for illuminating an object in coherent x-ray diffraction imaging, the intensities at the center of the diffraction pattern for the object are cut off by a beam stop that is utilized to block the intense beam. Until now, only iterative phase-retrieval methods have been applied to object reconstruction from a single diffraction pattern with a deficiency of central data due to a beam stop. As an alternative method, I present a noniterative solution in which an interpolation method based on the sampling theorem for the missing data is used for object reconstruction with our previously proposed phase-retrieval method using an aperture-array filter. Computer simulations demonstrate the reconstruction of a complex-amplitude object from a single diffraction pattern with a missing data area, which is generally difficult to treat with the iterative methods because a nonnegativity constraint cannot be used for such an object.
Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven
2010-11-01
The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.
Northwest Trajectory Analysis Capability: A Platform for Enhancing Computational Biophysics Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, Elena S.; Stephan, Eric G.; Corrigan, Abigail L.
2008-07-30
As computational resources continue to increase, the ability of computational simulations to effectively complement, and in some cases replace, experimentation in scientific exploration also increases. Today, large-scale simulations are recognized as an effective tool for scientific exploration in many disciplines including chemistry and biology. A natural side effect of this trend has been the need for an increasingly complex analytical environment. In this paper, we describe Northwest Trajectory Analysis Capability (NTRAC), an analytical software suite developed to enhance the efficiency of computational biophysics analyses. Our strategy is to layer higher-level services and introduce improved tools within the user’s familiar environmentmore » without preventing researchers from using traditional tools and methods. Our desire is to share these experiences to serve as an example for effectively analyzing data intensive large scale simulation data.« less
An integrated decision support system for TRAC: A proposal
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
Optimal allocation and usage of resources is a key to effective management. Resources of concern to TRAC are: Manpower (PSY), Money (Travel, contracts), Computing, Data, Models, etc. Management activities of TRAC include: Planning, Programming, Tasking, Monitoring, Updating, and Coordinating. Existing systems are insufficient, not completely automated, manpower intensive, and has the potential for data inconsistency exists. A system is proposed which suggests a means to integrate all project management activities of TRAC through the development of a sophisticated software and by utilizing the existing computing systems and network resources. The systems integration proposal is examined in detail.
Amols, Howard I
2008-11-01
New technologies such as intensity modulated and image guided radiation therapy, computer controlled linear accelerators, record and verify systems, electronic charts, and digital imaging have revolutionized radiation therapy over the past 10-15 y. Quality assurance (QA) as historically practiced and as recommended in reports such as American Association of Physicists in Medicine Task Groups 40 and 53 needs to be updated to address the increasing complexity and computerization of radiotherapy equipment, and the increased quantity of data defining a treatment plan and treatment delivery. While new technology has reduced the probability of many types of medical events, seeing new types of errors caused by improper use of new technology, communication failures between computers, corrupted or erroneous computer data files, and "software bugs" are now being seen. The increased use of computed tomography, magnetic resonance, and positron emission tomography imaging has become routine for many types of radiotherapy treatment planning, and QA for imaging modalities is beyond the expertise of most radiotherapy physicists. Errors in radiotherapy rarely result solely from hardware failures. More commonly they are a combination of computer and human errors. The increased use of radiosurgery, hypofractionation, more complex intensity modulated treatment plans, image guided radiation therapy, and increasing financial pressures to treat more patients in less time will continue to fuel this reliance on high technology and complex computer software. Clinical practitioners and regulatory agencies are beginning to realize that QA for new technologies is a major challenge and poses dangers different in nature than what are historically familiar.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing
NASA Astrophysics Data System (ADS)
Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C.; Gao, Wen
2018-05-01
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
A Fuzzy Computing Model for Identifying Polarity of Chinese Sentiment Words
Huang, Yongfeng; Wu, Xian; Li, Xing
2015-01-01
With the spurt of online user-generated contents on web, sentiment analysis has become a very active research issue in data mining and natural language processing. As the most important indicator of sentiment, sentiment words which convey positive and negative polarity are quite instrumental for sentiment analysis. However, most of the existing methods for identifying polarity of sentiment words only consider the positive and negative polarity by the Cantor set, and no attention is paid to the fuzziness of the polarity intensity of sentiment words. In order to improve the performance, we propose a fuzzy computing model to identify the polarity of Chinese sentiment words in this paper. There are three major contributions in this paper. Firstly, we propose a method to compute polarity intensity of sentiment morphemes and sentiment words. Secondly, we construct a fuzzy sentiment classifier and propose two different methods to compute the parameter of the fuzzy classifier. Thirdly, we conduct extensive experiments on four sentiment words datasets and three review datasets, and the experimental results indicate that our model performs better than the state-of-the-art methods. PMID:26106409
Data privacy considerations in Intensive Care Grids.
Luna, Jesus; Dikaiakos, Marios D; Kyprianou, Theodoros; Bilas, Angelos; Marazakis, Manolis
2008-01-01
Novel eHealth systems are being designed to provide a citizen-centered health system, however the even demanding need for computing and data resources has required the adoption of Grid technologies. In most of the cases, this novel Health Grid requires not only conveying patient's personal data through public networks, but also storing it into shared resources out of the hospital premises. These features introduce new security concerns, in particular related with privacy. In this paper we survey current legal and technological approaches that have been taken to protect a patient's personal data into eHealth systems, with a particular focus in Intensive Care Grids. However, thanks to a security analysis applied over the Intensive Care Grid system (ICGrid) we show that these security mechanisms are not enough to provide a comprehensive solution, mainly because the data-at-rest is still vulnerable to attacks coming from untrusted Storage Elements where an attacker may directly access them. To cope with these issues, we propose a new privacy-oriented protocol which uses a combination of encryption and fragmentation to improve data's assurance while keeping compatibility with current legislations and Health Grid security mechanisms.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
NASA Technical Reports Server (NTRS)
Raju, I. S.; Newman, J. C., Jr.
1993-01-01
A computer program, surf3d, that uses the 3D finite-element method to calculate the stress-intensity factors for surface, corner, and embedded cracks in finite-thickness plates with and without circular holes, was developed. The cracks are assumed to be either elliptic or part eliptic in shape. The computer program uses eight-noded hexahedral elements to model the solid. The program uses a skyline storage and solver. The stress-intensity factors are evaluated using the force method, the crack-opening displacement method, and the 3-D virtual crack closure methods. In the manual the input to and the output of the surf3d program are described. This manual also demonstrates the use of the program and describes the calculation of the stress-intensity factors. Several examples with sample data files are included with the manual. To facilitate modeling of the user's crack configuration and loading, a companion program (a preprocessor program) that generates the data for the surf3d called gensurf was also developed. The gensurf program is a three dimensional mesh generator program that requires minimal input and that builds a complete data file for surf3d. The program surf3d is operational on Unix machines such as CRAY Y-MP, CRAY-2, and Convex C-220.
NASA Astrophysics Data System (ADS)
Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu
2015-07-01
The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Tudor-Locke, Catrine; Leonardi, Claudia; Johnson, William D; Katzmarzyk, Peter T
2011-12-01
To determine time spent on the working day in sleep, work, sedentary behaviors, and light-, moderate-, and vigorous-intensity behaviors by occupation intensity. Data came from 30,758 working respondents to the 2003 to 2009 American Time Use Survey. Mean ± SEM time spent in work, sedentary behaviors, light-, moderate-, and vigorous-intensity activities, and sleep were computed by occupations classified as sedentary, light, moderate, and vigorous intensity. On average, approximately 32% of the 24-hour day was spent sleeping and approximately 31% was spent at work. Time spent in sedentary behaviors outside of work was higher, and light-intensity time was lower, with higher levels of intensity-defined occupation. Those employed in sedentary occupations were sedentary for approximately 11 hours per day, leaving little time to achieve recommended levels of physical activity for overall health.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Neubert, Antje; Dormann, Harald; Prokosch, Hans-Ulrich; Bürkle, Thomas; Rascher, Wolfgang; Sojer, Reinhold; Brune, Kay; Criegee-Rieck, Manfred
2013-09-01
Computer-assisted signal generation is an important issue for the prevention of adverse drug reactions (ADRs). However, due to poor standardization of patients' medical data and a lack of computable medical drug knowledge the specificity of computerized decision support systems for early ADR detection is too low and thus those systems are not yet implemented in daily clinical practice. We report on a method to formalize knowledge about ADRs based on the Summary of Product Characteristics (SmPCs) and linking them with structured patient data to generate safety signals automatically and with high sensitivity and specificity. A computable ADR knowledge base (ADR-KB) that inherently contains standardized concepts for ADRs (WHO-ART), drugs (ATC) and laboratory test results (LOINC) was built. The system was evaluated in study populations of paediatric and internal medicine inpatients. A total of 262 different ADR concepts related to laboratory findings were linked to 212 LOINC terms. The ADR knowledge base was retrospectively applied to a study population of 970 admissions (474 internal and 496 paediatric patients), who underwent intensive ADR surveillance. The specificity increased from 7% without ADR-KB up to 73% in internal patients and from 19.6% up to 91% in paediatric inpatients, respectively. This study shows that contextual linkage of patients' medication data with laboratory test results is a useful and reasonable instrument for computer-assisted ADR detection and a valuable step towards a systematic drug safety process. The system enables automated detection of ADRs during clinical practice with a quality close to intensive chart review. © 2013 The Authors. British Journal of Clinical Pharmacology © 2013 The British Pharmacological Society.
Hu, Yu-Chen
2018-01-01
The emergence of smart Internet of Things (IoT) devices has highly favored the realization of smart homes in a down-stream sector of a smart grid. The underlying objective of Demand Response (DR) schemes is to actively engage customers to modify their energy consumption on domestic appliances in response to pricing signals. Domestic appliance scheduling is widely accepted as an effective mechanism to manage domestic energy consumption intelligently. Besides, to residential customers for DR implementation, maintaining a balance between energy consumption cost and users’ comfort satisfaction is a challenge. Hence, in this paper, a constrained Particle Swarm Optimization (PSO)-based residential consumer-centric load-scheduling method is proposed. The method can be further featured with edge computing. In contrast with cloud computing, edge computing—a method of optimizing cloud computing technologies by driving computing capabilities at the IoT edge of the Internet as one of the emerging trends in engineering technology—addresses bandwidth-intensive contents and latency-sensitive applications required among sensors and central data centers through data analytics at or near the source of data. A non-intrusive load-monitoring technique proposed previously is utilized to automatic determination of physical characteristics of power-intensive home appliances from users’ life patterns. The swarm intelligence, constrained PSO, is used to minimize the energy consumption cost while considering users’ comfort satisfaction for DR implementation. The residential consumer-centric load-scheduling method proposed in this paper is evaluated under real-time pricing with inclining block rates and is demonstrated in a case study. The experimentation reported in this paper shows the proposed residential consumer-centric load-scheduling method can re-shape loads by home appliances in response to DR signals. Moreover, a phenomenal reduction in peak power consumption is achieved by 13.97%. PMID:29702607
The anomalous demagnetization behaviour of chondritic meteorites
NASA Astrophysics Data System (ADS)
Morden, S. J.
1992-06-01
Alternating field (AF) demagnetization of chondritic samples often shows anomalous results such as large directional and intensity changes; 'saw-tooth' intensity vs. demagnetizing field curves are also prevalent. An attempt to explain this behaviour is presented, using a computer model in which individual 'mineral grains' can be 'magnetized' in a variety of different ways. A simulated demagnetization can then be carried out to examine the results. It was found that the experimental behaviour of chondrites can be successfully mimicked by loading the computer model with a series of randomly orientated and sized vectors. The parameters of the model can be changed to reflect different trends seen in experimental data. Many published results can be modelled using this method. A known magnetic mineralogy can be modelled, and an unknown mineralogy deduced from AF demagnetization curves. Only by comparing data from mutually orientated samples can true stable regions for palaeointensity measurements be identified, calling into question some previous estimates of field strength from meteorites.
Brain tissue segmentation in 4D CT using voxel classification
NASA Astrophysics Data System (ADS)
van den Boom, R.; Oei, M. T. H.; Lafebre, S.; Oostveen, L. J.; Meijer, F. J. A.; Steens, S. C. A.; Prokop, M.; van Ginneken, B.; Manniesing, R.
2012-02-01
A method is proposed to segment anatomical regions of the brain from 4D computer tomography (CT) patient data. The method consists of a three step voxel classification scheme, each step focusing on structures that are increasingly difficult to segment. The first step classifies air and bone, the second step classifies vessels and the third step classifies white matter, gray matter and cerebrospinal fluid. As features the time averaged intensity value and the temporal intensity change value were used. In each step, a k-Nearest-Neighbor classifier was used to classify the voxels. Training data was obtained by placing regions of interest in reconstructed 3D image data. The method has been applied to ten 4D CT cerebral patient data. A leave-one-out experiment showed consistent and accurate segmentation results.
NASA Astrophysics Data System (ADS)
Uijlenhoet, R.; Overeem, A.; Leijnse, H.; Rios Gaona, M. F.
2017-12-01
The basic principle of rainfall estimation using microwave links is as follows. Rainfall attenuates the electromagnetic signals transmitted from one telephone tower to another. By measuring the received power at one end of a microwave link as a function of time, the path-integrated attenuation due to rainfall can be calculated, which can be converted to average rainfall intensities over the length of a link. Microwave links from cellular communication networks have been proposed as a promising new rainfall measurement technique for one decade. They are particularly interesting for those countries where few surface rainfall observations are available. Yet to date no operational (real-time) link-based rainfall products are available. To advance the process towards operational application and upscaling of this technique, there is a need for freely available, user-friendly computer code for microwave link data processing and rainfall mapping. Such software is now available as R package "RAINLINK" on GitHub (https://github.com/overeem11/RAINLINK). It contains a working example to compute link-based 15-min rainfall maps for the entire surface area of The Netherlands for 40 hours from real microwave link data. This is a working example using actual data from an extensive network of commercial microwave links, for the first time, which will allow users to test their own algorithms and compare their results with ours. The package consists of modular functions, which facilitates running only part of the algorithm. The main processings steps are: 1) Preprocessing of link data (initial quality and consistency checks); 2) Wet-dry classification using link data; 3) Reference signal determination; 4) Removal of outliers ; 5) Correction of received signal powers; 6) Computation of mean path-averaged rainfall intensities; 7) Interpolation of rainfall intensities ; 8) Rainfall map visualisation. Some applications of RAINLINK will be shown based on microwave link data from a temperate climate (the Netherlands), and from a subtropical climate (Brazil). We hope that RAINLINK will promote the application of rainfall monitoring using microwave links in poorly gauged regions around the world. We invite researchers to contribute to RAINLINK to make the code more generally applicable to data from different networks and climates.
Huang, Charles Lung-Cheng; Hsiao, Sigmund; Hwu, Hai-Gwo; Howng, Shen-Long
2012-12-30
The Chinese Facial Emotion Recognition Database (CFERD), a computer-generated three-dimensional (3D) paradigm, was developed to measure the recognition of facial emotional expressions at different intensities. The stimuli consisted of 3D colour photographic images of six basic facial emotional expressions (happiness, sadness, disgust, fear, anger and surprise) and neutral faces of the Chinese. The purpose of the present study is to describe the development and validation of CFERD with nonclinical healthy participants (N=100; 50 men; age ranging between 18 and 50 years), and to generate normative data set. The results showed that the sensitivity index d' [d'=Z(hit rate)-Z(false alarm rate), where function Z(p), p∈[0,1
Active Provenance in Data-intensive Research
NASA Astrophysics Data System (ADS)
Spinuso, Alessandro; Mihajlovski, Andrej; Filgueira, Rosa; Atkinson, Malcolm
2017-04-01
Scientific communities are building platforms where the usage of data-intensive workflows is crucial to conduct their research campaigns. However managing and effectively support the understanding of the 'live' processes, fostering computational steering, sharing and re-use of data and methods, present several bottlenecks. These are often caused by the poor level of documentation on the methods and the data and how users interact with it. This work wants to explore how in such systems, flexibility in the management of the provenance and its adaptation to the different users and application contexts can lead to new opportunities for its exploitation, improving productivity. In particular, this work illustrates a conceptual and technical framework enabling tunable and actionable provenance in data-intensive workflow systems in support of reproducible science. It introduces the concept of Agile data-intensive systems to define the characteristic of our target platform. It shows a novel approach to the integration of provenance mechanisms, offering flexibility in the scale and in the precision of the provenance data collected, ensuring its relevance to the domain of the data-intensive task, fostering its rapid exploitation. The contributions address aspects of the scale of the provenance records, their usability and active role in the research life-cycle. We will discuss the use of dynamically generated provenance types as the approach for the integration of provenance mechanisms into a data-intensive workflow system. Enabling provenance can be transparent to the workflow user and developer, as well as fully controllable and customisable, depending from their expertise and the application's reproducibility, monitoring and validation requirements. The API that allows the realisation and adoption of a provenance type is presented, especially for what concerns the support of provenance profiling, contextualisation and precision. An actionable approach to provenance management will be also discussed, enabling provenance-driven operations at runtime, regardless of the enactment technologies and connectivity impediments. We proposes a framework based on concepts such as provenance clusters and provenance sensors, envisaging new potential for exploiting large quantities of provenance traces at runtime. Finally the work will also introduce how the underlying provenance model can be explored with big-data visualization techniques, aiming at producing comprehensive and interactive views on top of large and heterogeneous provenance data. We will demonstrate the adoption of alternative visualisation methods, from detailed and localised interactive graphs to radial-views, serving different purposes and expertise. Combining provenance types, selective rules, extensible metadata with reactive clustering opens a new and more versatile role of the lineage information in the research life-cycle, thanks to its improved usability. The flexible profiling of the proposed framework offers aid to the human analysis of the process, with the support of advanced and intuitive interactive graphical tools. The Active provenance methods are discussed in the context of a real implementation for a data-intensive library (dispel4py) and its adoption within use cases for computational seismology, climate studies and generic correlation analysis.
Neylon, J; Min, Y; Kupelian, P; Low, D A; Santhanam, A
2017-04-01
In this paper, a multi-GPU cloud-based server (MGCS) framework is presented for dose calculations, exploring the feasibility of remote computing power for parallelization and acceleration of computationally and time intensive radiotherapy tasks in moving toward online adaptive therapies. An analytical model was developed to estimate theoretical MGCS performance acceleration and intelligently determine workload distribution. Numerical studies were performed with a computing setup of 14 GPUs distributed over 4 servers interconnected by a 1 Gigabits per second (Gbps) network. Inter-process communication methods were optimized to facilitate resource distribution and minimize data transfers over the server interconnect. The analytically predicted computation time predicted matched experimentally observations within 1-5 %. MGCS performance approached a theoretical limit of acceleration proportional to the number of GPUs utilized when computational tasks far outweighed memory operations. The MGCS implementation reproduced ground-truth dose computations with negligible differences, by distributing the work among several processes and implemented optimization strategies. The results showed that a cloud-based computation engine was a feasible solution for enabling clinics to make use of fast dose calculations for advanced treatment planning and adaptive radiotherapy. The cloud-based system was able to exceed the performance of a local machine even for optimized calculations, and provided significant acceleration for computationally intensive tasks. Such a framework can provide access to advanced technology and computational methods to many clinics, providing an avenue for standardization across institutions without the requirements of purchasing, maintaining, and continually updating hardware.
NASA Astrophysics Data System (ADS)
Viswanath, Satish; Tiwari, Pallavi; Rosen, Mark; Madabhushi, Anant
2008-03-01
Recently, in vivo Magnetic Resonance Imaging (MRI) and Magnetic Resonance Spectroscopy (MRS) have emerged as promising new modalities to aid in prostate cancer (CaP) detection. MRI provides anatomic and structural information of the prostate while MRS provides functional data pertaining to biochemical concentrations of metabolites such as creatine, choline and citrate. We have previously presented a hierarchical clustering scheme for CaP detection on in vivo prostate MRS and have recently developed a computer-aided method for CaP detection on in vivo prostate MRI. In this paper we present a novel scheme to develop a meta-classifier to detect CaP in vivo via quantitative integration of multimodal prostate MRS and MRI by use of non-linear dimensionality reduction (NLDR) methods including spectral clustering and locally linear embedding (LLE). Quantitative integration of multimodal image data (MRI and PET) involves the concatenation of image intensities following image registration. However multimodal data integration is non-trivial when the individual modalities include spectral and image intensity data. We propose a data combination solution wherein we project the feature spaces (image intensities and spectral data) associated with each of the modalities into a lower dimensional embedding space via NLDR. NLDR methods preserve the relationships between the objects in the original high dimensional space when projecting them into the reduced low dimensional space. Since the original spectral and image intensity data are divorced from their original physical meaning in the reduced dimensional space, data at the same spatial location can be integrated by concatenating the respective embedding vectors. Unsupervised consensus clustering is then used to partition objects into different classes in the combined MRS and MRI embedding space. Quantitative results of our multimodal computer-aided diagnosis scheme on 16 sets of patient data obtained from the ACRIN trial, for which corresponding histological ground truth for spatial extent of CaP is known, show a marginally higher sensitivity, specificity, and positive predictive value compared to corresponding CAD results with the individual modalities.
Optimizing Interactive Development of Data-Intensive Applications
Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd
2017-01-01
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637
Abstracting ICU Nursing Care Quality Data From the Electronic Health Record.
Seaman, Jennifer B; Evans, Anna C; Sciulli, Andrea M; Barnato, Amber E; Sereika, Susan M; Happ, Mary Beth
2017-09-01
The electronic health record is a potentially rich source of data for clinical research in the intensive care unit setting. We describe the iterative, multi-step process used to develop and test a data abstraction tool, used for collection of nursing care quality indicators from the electronic health record, for a pragmatic trial. We computed Cohen's kappa coefficient (κ) to assess interrater agreement or reliability of data abstracted using preliminary and finalized tools. In assessing the reliability of study data ( n = 1,440 cases) using the finalized tool, 108 randomly selected cases (10% of first half sample; 5% of last half sample) were independently abstracted by a second rater. We demonstrated mean κ values ranging from 0.61 to 0.99 for all indicators. Nursing care quality data can be accurately and reliably abstracted from the electronic health records of intensive care unit patients using a well-developed data collection tool and detailed training.
Advances and Limitations of Modern Macroseismic Data Gathering
NASA Astrophysics Data System (ADS)
Wald, D. J.; Dewey, J. W.; Quitoriano, V. P. R.
2016-12-01
All macroseismic data are not created equal. At about the time that the European Macroseismic Scale 1998 (EMS-98; itself a revision of EMS-92) formalized a procedure to account for building vulnerability and damage grade statistics in assigning intensities from traditional field observations, a parallel universe of internet-based intensity reporting was coming online. The divergence of intensities assigned by field reconnaissance and intensities based on volunteered reports poses unique challenges. U.S. Geological Survey's Did You Feel It? (DYFI) and its Italian (National Institute of Geophysics and Volcanology) counterpart use questionnaires based on the traditional format, submitted by volunteers. The Italian strategy uses fuzzy logic to assign integer values of intensity from questionnaire responses, whereas DYFI assigns weights to macroseismic effects and computes real-valued intensities to a 0.1 MMI unit precision. DYFI responses may be grouped together by postal code, or by smaller latitude-longitude boxes; calculated intensities may vary depending on how observations are grouped. New smartphone-based procedures depart further from tradition by asking respondents to select from cartoons corresponding to various intensity levels that best fit their experience. While nearly instantaneous, these thumbnail-based intensities are strictly integer values and do not record specific macroseismic effects. Finally, a recent variation on traditional intensity assignments derives intensities not from field surveys or questionnaires sent to target audiences but rather from media reports, photojournalism, and internet posts that may or may not constitute the representative observations needed for consistent EMS-98 assignments. We review these issues and suggest due-diligence strategies for utilizing varied macroseismic data sets within real-time applications and in quantitative hazard and engineering analyses.
Computation of transmitted and received B1 fields in magnetic resonance imaging.
Milles, Julien; Zhu, Yue Min; Chen, Nan-Kuei; Panych, Lawrence P; Gimenez, Gérard; Guttmann, Charles R G
2006-05-01
Computation of B1 fields is a key issue for determination and correction of intensity nonuniformity in magnetic resonance images. This paper presents a new method for computing transmitted and received B1 fields. Our method combines a modified MRI acquisition protocol and an estimation technique based on the Levenberg-Marquardt algorithm and spatial filtering. It enables accurate estimation of transmitted and received B1 fields for both homogeneous and heterogeneous objects. The method is validated using numerical simulations and experimental data from phantom and human scans. The experimental results are in agreement with theoretical expectations.
An Ab Initio Based Potential Energy Surface for Water
NASA Technical Reports Server (NTRS)
Partridge, Harry; Schwenke, David W.; Langhoff, Stephen R. (Technical Monitor)
1996-01-01
We report a new determination of the water potential energy surface. A high quality ab initio potential energy surface (PES) and dipole moment function of water have been computed. This PES is empirically adjusted to improve the agreement between the computed line positions and those from the HITRAN 92 data base. The adjustment is small, nonetheless including an estimate of core (oxygen 1s) electron correlation greatly improves the agreement with experiment. Of the 27,245 assigned transitions in the HITRAN 92 data base for H2(O-16), the overall root mean square (rms) deviation between the computed and observed line positions is 0.125/cm. However the deviations do not correspond to a normal distribution: 69% of the lines have errors less than 0.05/cm. Overall, the agreement between the line intensities computed in the present work and those contained in the data base is quite good, however there are a significant number of line strengths which differ greatly.
Applications of Phase-Based Motion Processing
NASA Technical Reports Server (NTRS)
Branch, Nicholas A.; Stewart, Eric C.
2018-01-01
Image pyramids provide useful information in determining structural response at low cost using commercially available cameras. The current effort applies previous work on the complex steerable pyramid to analyze and identify imperceptible linear motions in video. Instead of implicitly computing motion spectra through phase analysis of the complex steerable pyramid and magnifying the associated motions, instead present a visual technique and the necessary software to display the phase changes of high frequency signals within video. The present technique quickly identifies regions of largest motion within a video with a single phase visualization and without the artifacts of motion magnification, but requires use of the computationally intensive Fourier transform. While Riesz pyramids present an alternative to the computationally intensive complex steerable pyramid for motion magnification, the Riesz formulation contains significant noise, and motion magnification still presents large amounts of data that cannot be quickly assessed by the human eye. Thus, user-friendly software is presented for quickly identifying structural response through optical flow and phase visualization in both Python and MATLAB.
Use of application containers and workflows for genomic data analysis.
Schulz, Wade L; Durant, Thomas J S; Siddon, Alexa J; Torres, Richard
2016-01-01
The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.
Lyashevska, Olga; Brus, Dick J; van der Meer, Jaap
2016-01-01
The objective of the study was to provide a general procedure for mapping species abundance when data are zero-inflated and spatially correlated counts. The bivalve species Macoma balthica was observed on a 500×500 m grid in the Dutch part of the Wadden Sea. In total, 66% of the 3451 counts were zeros. A zero-inflated Poisson mixture model was used to relate counts to environmental covariates. Two models were considered, one with relatively fewer covariates (model "small") than the other (model "large"). The models contained two processes: a Bernoulli (species prevalence) and a Poisson (species intensity, when the Bernoulli process predicts presence). The model was used to make predictions for sites where only environmental data are available. Predicted prevalences and intensities show that the model "small" predicts lower mean prevalence and higher mean intensity, than the model "large". Yet, the product of prevalence and intensity, which might be called the unconditional intensity, is very similar. Cross-validation showed that the model "small" performed slightly better, but the difference was small. The proposed methodology might be generally applicable, but is computer intensive.
TU-AB-303-08: GPU-Based Software Platform for Efficient Image-Guided Adaptive Radiation Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, S; Robinson, A; McNutt, T
2015-06-15
Purpose: In this study, we develop an integrated software platform for adaptive radiation therapy (ART) that combines fast and accurate image registration, segmentation, and dose computation/accumulation methods. Methods: The proposed system consists of three key components; 1) deformable image registration (DIR), 2) automatic segmentation, and 3) dose computation/accumulation. The computationally intensive modules including DIR and dose computation have been implemented on a graphics processing unit (GPU). All required patient-specific data including the planning CT (pCT) with contours, daily cone-beam CTs, and treatment plan are automatically queried and retrieved from their own databases. To improve the accuracy of DIR between pCTmore » and CBCTs, we use the double force demons DIR algorithm in combination with iterative CBCT intensity correction by local intensity histogram matching. Segmentation of daily CBCT is then obtained by propagating contours from the pCT. Daily dose delivered to the patient is computed on the registered pCT by a GPU-accelerated superposition/convolution algorithm. Finally, computed daily doses are accumulated to show the total delivered dose to date. Results: Since the accuracy of DIR critically affects the quality of the other processes, we first evaluated our DIR method on eight head-and-neck cancer cases and compared its performance. Normalized mutual-information (NMI) and normalized cross-correlation (NCC) computed as similarity measures, and our method produced overall NMI of 0.663 and NCC of 0.987, outperforming conventional methods by 3.8% and 1.9%, respectively. Experimental results show that our registration method is more consistent and roust than existing algorithms, and also computationally efficient. Computation time at each fraction took around one minute (30–50 seconds for registration and 15–25 seconds for dose computation). Conclusion: We developed an integrated GPU-accelerated software platform that enables accurate and efficient DIR, auto-segmentation, and dose computation, thus supporting an efficient ART workflow. This work was supported by NIH/NCI under grant R42CA137886.« less
Aggregating Data for Computational Toxicology Applications ...
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built usi
Turbulence in planetary occultations. IV - Power spectra of phase and intensity fluctuations
NASA Technical Reports Server (NTRS)
Haugstad, B. S.
1979-01-01
Power spectra of phase and intensity scintillations during occultation by turbulent planetary atmospheres are significantly affected by the inhomogeneous background upon which the turbulence is superimposed. Such coupling is particularly pronounced in the intensity, where there is also a marked difference in spectral shape between a central and grazing occultation. While the former has its structural features smoothed by coupling to the inhomogeneous background, such features are enhanced in the latter. Indeed, the latter power spectrum peaks around the characteristic frequency that is determined by the size of the free-space Fresnel zone and the ray velocity in the atmosphere; at higher frequencies strong fringes develop in the power spectrum. A confrontation between the theoretical scintillation spectra computed here and those calculated from the Mariner 5 Venus mission by Woo et al. (1974) is inconclusive, mainly because of insufficient statistical resolution. Phase and/or intensity power spectra computed from occultation data may be used to deduce characteristics of the turbulence and to distinguish turbulence from other perturbations in the refractive index. Such determinations are facilitated if observations are made at two or more frequencies (radio occultation) or in two or more colors (stellar occultation).
Dakua, Sarada Prasad; Abinahed, Julien; Al-Ansari, Abdulla
2015-04-01
Liver segmentation continues to remain a major challenge, largely due to its intense complexity with surrounding anatomical structures (stomach, kidney, and heart), high noise level and lack of contrast in pathological computed tomography (CT) data. We present an approach to reconstructing the liver surface in low contrast CT. The main contributions are: (1) a stochastic resonance-based methodology in discrete cosine transform domain is developed to enhance the contrast of pathological liver images, (2) a new formulation is proposed to prevent the object boundary, resulting from the cellular automata method, from leaking into the surrounding areas of similar intensity, and (3) a level-set method is suggested to generate intermediate segmentation contours from two segmented slices distantly located in a subject sequence. We have tested the algorithm on real datasets obtained from two sources, Hamad General Hospital and medical image computing and computer-assisted interventions grand challenge workshop. Various parameters in the algorithm, such as [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], play imperative roles, thus their values are precisely selected. Both qualitative and quantitative evaluation performed on liver data show promising segmentation accuracy when compared with ground truth data reflecting the potential of the proposed method.
Trivariate characteristics of intensity fluctuations for heavily saturated optical systems.
Das, Biman; Drake, Eli; Jack, John
2004-02-01
Trivariate cumulants of intensity fluctuations have been computed starting from a trivariate intensity probability distribution function, which rests on the assumption that the variation of intensity has a maximum entropy distribution with the constraint that the total intensity is constant. The assumption holds for optical systems such as a thin, long, mirrorless gas laser amplifier where under heavy gain saturation the total output approaches a constant intensity, although intensity of any mode fluctuates rapidly over the average intensity. The relations between trivariate cumulants and central moments that were needed for the computation of trivariate cumulants were derived. The results of the computation show that the cumulants have characteristic values that depend on the number of interacting modes in the system. The cumulant values approach zero when the number of modes is infinite, as expected. The results will be useful for comparison with the experimental triavariate statistics of heavily saturated optical systems such as the output from a thin, long, bidirectional gas laser amplifier.
A Bioinformatics Module for Use in an Introductory Biology Laboratory
ERIC Educational Resources Information Center
Alaie, Adrienne; Teller, Virginia; Qiu, Wei-gang
2012-01-01
Since biomedical science has become increasingly data-intensive, acquisition of computational and quantitative skills by science students has become more important. For non-science students, an introduction to biomedical databases and their applications promotes the development of a scientifically literate population. Because typical college…
ERIC Educational Resources Information Center
Liu, Dennis
2010-01-01
Biology is well suited for mathematical description, from the perfect geometry of viruses, to equations that describe the flux of ions across cellular membranes, to computationally intensive models for protein folding. For this short Web review, however, the author focuses on how mathematics helps biologists sort, evaluate, and draw conclusions…
Center for Technology for Advanced Scientific Componet Software (TASCS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Govindaraju, Madhusudhan
Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less
User's manual for SEDCALC, a computer program for computation of suspended-sediment discharge
Koltun, G.F.; Gray, John R.; McElhone, T.J.
1994-01-01
Sediment-Record Calculations (SEDCALC), a menu-driven set of interactive computer programs, was developed to facilitate computation of suspended-sediment records. The programs comprising SEDCALC were developed independently in several District offices of the U.S. Geological Survey (USGS) to minimize the intensive labor associated with various aspects of sediment-record computations. SEDCALC operates on suspended-sediment-concentration data stored in American Standard Code for Information Interchange (ASCII) files in a predefined card-image format. Program options within SEDCALC can be used to assist in creating and editing the card-image files, as well as to reformat card-image files to and from formats used by the USGS Water-Quality System. SEDCALC provides options for creating card-image files containing time series of equal-interval suspended-sediment concentrations from 1. digitized suspended-sediment-concentration traces, 2. linear interpolation between log-transformed instantaneous suspended-sediment-concentration data stored at unequal time intervals, and 3. nonlinear interpolation between log-transformed instantaneous suspended-sediment-concentration data stored at unequal time intervals. Suspended-sediment discharge can be computed from the streamflow and suspended-sediment-concentration data or by application of transport relations derived by regressing log-transformed instantaneous streamflows on log-transformed instantaneous suspended-sediment concentrations or discharges. The computed suspended-sediment discharge data are stored in card-image files that can be either directly imported to the USGS Automated Data Processing System or used to generate plots by means of other SEDCALC options.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jun
Our group has been working with ANL collaborators on the topic bridging the gap between parallel file system and local file system during the course of this project period. We visited Argonne National Lab -- Dr. Robert Ross's group for one week in the past summer 2007. We looked over our current project progress and planned the activities for the incoming years 2008-09. The PI met Dr. Robert Ross several times such as HEC FSIO workshop 08, SC08 and SC10. We explored the opportunities to develop a production system by leveraging our current prototype to (SOGP+PVFS) a new PVFS version.more » We delivered SOGP+PVFS codes to ANL PVFS2 group in 2008.We also talked about exploring a potential project on developing new parallel programming models and runtime systems for data-intensive scalable computing (DISC). The methodology is to evolve MPI towards DISC by incorporating some functions of Google MapReduce parallel programming model. More recently, we are together exploring how to leverage existing works to perform (1) coordination/aggregation of local I/O operations prior to movement over the WAN, (2) efficient bulk data movement over the WAN, (3) latency hiding techniques for latency-intensive operations. Since 2009, we start applying Hadoop/MapReduce to some HEC applications with LANL scientists John Bent and Salman Habib. Another on-going work is to improve checkpoint performance at I/O forwarding Layer for the Road Runner super computer with James Nuetz and Gary Gridder at LANL. Two senior undergraduates from our research group did summer internships about high-performance file and storage system projects in LANL since 2008 for consecutive three years. Both of them are now pursuing Ph.D. degree in our group and will be 4th year in the PhD program in Fall 2011 and go to LANL to advance two above-mentioned works during this winter break. Since 2009, we have been collaborating with several computer scientists (Gary Grider, John bent, Parks Fields, James Nunez, Hsing-Bung Chen, etc) from HPC5 and James Ahrens from Advanced Computing Laboratory in Los Alamos National Laboratory. We hold a weekly conference and/or video meeting on advancing works at two fronts: the hardware/software infrastructure of building large-scale data intensive cluster and research publications. Our group members assist in constructing several onsite LANL data intensive clusters. Two parties have been developing software codes and research papers together using both sides resources.« less
Computer-aided diagnosis of leukoencephalopathy in children treated for acute lymphoblastic leukemia
NASA Astrophysics Data System (ADS)
Glass, John O.; Li, Chin-Shang; Helton, Kathleen J.; Reddick, Wilburn E.
2005-04-01
The purpose of this study was to use objective quantitative MR imaging methods to develop a computer-aided diagnosis tool to differentiate white matter (WM) hyperintensities as either leukoencephalopathy (LE) or normal maturational processes in children treated for acute lymphoblastic leukemia with intravenous high dose methotrexate. A combined imaging set consisting of T1, T2, PD, and FLAIR MR images and WM, gray matter, and cerebrospinal fluid a priori maps from a spatially normalized atlas were analyzed with a neural network segmentation based on a Kohonen Self-Organizing Map. Segmented regions were manually classified to identify the most hyperintense WM region and the normal appearing genu region. Signal intensity differences normalized to the genu within each examination were generated for two time points in 203 children. An unsupervised hierarchical clustering algorithm with the agglomeration method of McQuitty was used to divide data from the first examination into normal appearing or LE groups. A C-support vector machine (C-SVM) was then trained on the first examination data and used to classify the data from the second examination. The overall accuracy of the computer-aided detection tool was 83.5% (299/358) with sensitivity to normal WM of 86.9% (199/229) and specificity to LE of 77.5% (100/129) when compared to the readings of two expert observers. These results suggest that subtle therapy-induced leukoencephalopathy can be objectively and reproducibly detected in children treated for cancer using this computer-aided detection approach based on relative differences in quantitative signal intensity measures normalized within each examination.
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-05-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.
Security Risks of Cloud Computing and Its Emergence as 5th Utility Service
NASA Astrophysics Data System (ADS)
Ahmad, Mushtaq
Cloud Computing is being projected by the major cloud services provider IT companies such as IBM, Google, Yahoo, Amazon and others as fifth utility where clients will have access for processing those applications and or software projects which need very high processing speed for compute intensive and huge data capacity for scientific, engineering research problems and also e- business and data content network applications. These services for different types of clients are provided under DASM-Direct Access Service Management based on virtualization of hardware, software and very high bandwidth Internet (Web 2.0) communication. The paper reviews these developments for Cloud Computing and Hardware/Software configuration of the cloud paradigm. The paper also examines the vital aspects of security risks projected by IT Industry experts, cloud clients. The paper also highlights the cloud provider's response to cloud security risks.
Age-related differences in muscle fatigue vary by contraction type: a meta-analysis.
Avin, Keith G; Law, Laura A Frey
2011-08-01
During senescence, despite the loss of strength (force-generating capability) associated with sarcopenia, muscle endurance may improve for isometric contractions. The purpose of this study was to perform a systematic meta-analysis of young versus older adults, considering likely moderators (ie, contraction type, joint, sex, activity level, and task intensity). A 2-stage systematic review identified potential studies from PubMed, CINAHL, PEDro, EBSCOhost: ERIC, EBSCOhost: Sportdiscus, and The Cochrane Library. Studies reporting fatigue tasks (voluntary activation) performed at a relative intensity in both young (18-45 years of age) and old (≥ 55 years of age) adults who were healthy were considered. Sample size, mean and variance outcome data (ie, fatigue index or endurance time), joint, contraction type, task intensity (percentage of maximum), sex, and activity levels were extracted. Effect sizes were (1) computed for all data points; (2) subgrouped by contraction type, sex, joint or muscle group, intensity, or activity level; and (3) further subgrouped between contraction type and the remaining moderators. Out of 3,457 potential studies, 46 publications (with 78 distinct effect size data points) met all inclusion criteria. A lack of available data limited subgroup analyses (ie, sex, intensity, joint), as did a disproportionate spread of data (most intensities ≥ 50% of maximum voluntary contraction). Overall, older adults were able to sustain relative-intensity tasks significantly longer or with less force decay than younger adults (effect size=0.49). However, this age-related difference was present only for sustained and intermittent isometric contractions, whereas this age-related advantage was lost for dynamic tasks. When controlling for contraction type, the additional modifiers played minor roles. Identifying muscle endurance capabilities in the older adult may provide an avenue to improve functional capabilities, despite a clearly established decrement in peak torque.
Age-Related Differences in Muscle Fatigue Vary by Contraction Type: A Meta-analysis
Avin, Keith G.
2011-01-01
Background During senescence, despite the loss of strength (force-generating capability) associated with sarcopenia, muscle endurance may improve for isometric contractions. Purpose The purpose of this study was to perform a systematic meta-analysis of young versus older adults, considering likely moderators (ie, contraction type, joint, sex, activity level, and task intensity). Data Sources A 2-stage systematic review identified potential studies from PubMed, CINAHL, PEDro, EBSCOhost: ERIC, EBSCOhost: Sportdiscus, and The Cochrane Library. Study Selection Studies reporting fatigue tasks (voluntary activation) performed at a relative intensity in both young (18–45 years of age) and old (≥55 years of age) adults who were healthy were considered. Data Extraction Sample size, mean and variance outcome data (ie, fatigue index or endurance time), joint, contraction type, task intensity (percentage of maximum), sex, and activity levels were extracted. Data Synthesis Effect sizes were (1) computed for all data points; (2) subgrouped by contraction type, sex, joint or muscle group, intensity, or activity level; and (3) further subgrouped between contraction type and the remaining moderators. Out of 3,457 potential studies, 46 publications (with 78 distinct effect size data points) met all inclusion criteria. Limitations A lack of available data limited subgroup analyses (ie, sex, intensity, joint), as did a disproportionate spread of data (most intensities ≥50% of maximum voluntary contraction). Conclusions Overall, older adults were able to sustain relative-intensity tasks significantly longer or with less force decay than younger adults (effect size=0.49). However, this age-related difference was present only for sustained and intermittent isometric contractions, whereas this age-related advantage was lost for dynamic tasks. When controlling for contraction type, the additional modifiers played minor roles. Identifying muscle endurance capabilities in the older adult may provide an avenue to improve functional capabilities, despite a clearly established decrement in peak torque. PMID:21616932
NASA Astrophysics Data System (ADS)
Makatun, Dzmitry; Lauret, Jérôme; Rudová, Hana; Šumbera, Michal
2015-05-01
When running data intensive applications on distributed computational resources long I/O overheads may be observed as access to remotely stored data is performed. Latencies and bandwidth can become the major limiting factor for the overall computation performance and can reduce the CPU/WallTime ratio to excessive IO wait. Reusing the knowledge of our previous research, we propose a constraint programming based planner that schedules computational jobs and data placements (transfers) in a distributed environment in order to optimize resource utilization and reduce the overall processing completion time. The optimization is achieved by ensuring that none of the resources (network links, data storages and CPUs) are oversaturated at any moment of time and either (a) that the data is pre-placed at the site where the job runs or (b) that the jobs are scheduled where the data is already present. Such an approach eliminates the idle CPU cycles occurring when the job is waiting for the I/O from a remote site and would have wide application in the community. Our planner was evaluated and simulated based on data extracted from log files of batch and data management systems of the STAR experiment. The results of evaluation and estimation of performance improvements are discussed in this paper.
Online learning in optical tomography: a stochastic approach
NASA Astrophysics Data System (ADS)
Chen, Ke; Li, Qin; Liu, Jian-Guo
2018-07-01
We study the inverse problem of radiative transfer equation (RTE) using stochastic gradient descent method (SGD) in this paper. Mathematically, optical tomography amounts to recovering the optical parameters in RTE using the incoming–outgoing pair of light intensity. We formulate it as a PDE-constraint optimization problem, where the mismatch of computed and measured outgoing data is minimized with same initial data and RTE constraint. The memory and computation cost it requires, however, is typically prohibitive, especially in high dimensional space. Smart iterative solvers that only use partial information in each step is called for thereafter. Stochastic gradient descent method is an online learning algorithm that randomly selects data for minimizing the mismatch. It requires minimum memory and computation, and advances fast, therefore perfectly serves the purpose. In this paper we formulate the problem, in both nonlinear and its linearized setting, apply SGD algorithm and analyze the convergence performance.
Some experiences and opportunities for big data in translational research.
Chute, Christopher G; Ullman-Cullere, Mollie; Wood, Grant M; Lin, Simon M; He, Min; Pathak, Jyotishman
2013-10-01
Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.
Some experiences and opportunities for big data in translational research
Chute, Christopher G.; Ullman-Cullere, Mollie; Wood, Grant M.; Lin, Simon M.; He, Min; Pathak, Jyotishman
2014-01-01
Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of “big data.” The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records. PMID:24008998
The emerging role of cloud computing in molecular modelling.
Ebejer, Jean-Paul; Fulle, Simone; Morris, Garrett M; Finn, Paul W
2013-07-01
There is a growing recognition of the importance of cloud computing for large-scale and data-intensive applications. The distinguishing features of cloud computing and their relationship to other distributed computing paradigms are described, as are the strengths and weaknesses of the approach. We review the use made to date of cloud computing for molecular modelling projects and the availability of front ends for molecular modelling applications. Although the use of cloud computing technologies for molecular modelling is still in its infancy, we demonstrate its potential by presenting several case studies. Rapid growth can be expected as more applications become available and costs continue to fall; cloud computing can make a major contribution not just in terms of the availability of on-demand computing power, but could also spur innovation in the development of novel approaches that utilize that capacity in more effective ways. Copyright © 2013 Elsevier Inc. All rights reserved.
Lin, Yu-Hsiu; Hu, Yu-Chen
2018-04-27
The emergence of smart Internet of Things (IoT) devices has highly favored the realization of smart homes in a down-stream sector of a smart grid. The underlying objective of Demand Response (DR) schemes is to actively engage customers to modify their energy consumption on domestic appliances in response to pricing signals. Domestic appliance scheduling is widely accepted as an effective mechanism to manage domestic energy consumption intelligently. Besides, to residential customers for DR implementation, maintaining a balance between energy consumption cost and users’ comfort satisfaction is a challenge. Hence, in this paper, a constrained Particle Swarm Optimization (PSO)-based residential consumer-centric load-scheduling method is proposed. The method can be further featured with edge computing. In contrast with cloud computing, edge computing—a method of optimizing cloud computing technologies by driving computing capabilities at the IoT edge of the Internet as one of the emerging trends in engineering technology—addresses bandwidth-intensive contents and latency-sensitive applications required among sensors and central data centers through data analytics at or near the source of data. A non-intrusive load-monitoring technique proposed previously is utilized to automatic determination of physical characteristics of power-intensive home appliances from users’ life patterns. The swarm intelligence, constrained PSO, is used to minimize the energy consumption cost while considering users’ comfort satisfaction for DR implementation. The residential consumer-centric load-scheduling method proposed in this paper is evaluated under real-time pricing with inclining block rates and is demonstrated in a case study. The experimentation reported in this paper shows the proposed residential consumer-centric load-scheduling method can re-shape loads by home appliances in response to DR signals. Moreover, a phenomenal reduction in peak power consumption is achieved by 13.97%.
NASA Astrophysics Data System (ADS)
Hagan, Aaron; Sawant, Amit; Folkerts, Michael; Modiri, Arezoo
2018-01-01
We report on the design, implementation and characterization of a multi-graphic processing unit (GPU) computational platform for higher-order optimization in radiotherapy treatment planning. In collaboration with a commercial vendor (Varian Medical Systems, Palo Alto, CA), a research prototype GPU-enabled Eclipse (V13.6) workstation was configured. The hardware consisted of dual 8-core Xeon processors, 256 GB RAM and four NVIDIA Tesla K80 general purpose GPUs. We demonstrate the utility of this platform for large radiotherapy optimization problems through the development and characterization of a parallelized particle swarm optimization (PSO) four dimensional (4D) intensity modulated radiation therapy (IMRT) technique. The PSO engine was coupled to the Eclipse treatment planning system via a vendor-provided scripting interface. Specific challenges addressed in this implementation were (i) data management and (ii) non-uniform memory access (NUMA). For the former, we alternated between parameters over which the computation process was parallelized. For the latter, we reduced the amount of data required to be transferred over the NUMA bridge. The datasets examined in this study were approximately 300 GB in size, including 4D computed tomography images, anatomical structure contours and dose deposition matrices. For evaluation, we created a 4D-IMRT treatment plan for one lung cancer patient and analyzed computation speed while varying several parameters (number of respiratory phases, GPUs, PSO particles, and data matrix sizes). The optimized 4D-IMRT plan enhanced sparing of organs at risk by an average reduction of 26% in maximum dose, compared to the clinical optimized IMRT plan, where the internal target volume was used. We validated our computation time analyses in two additional cases. The computation speed in our implementation did not monotonically increase with the number of GPUs. The optimal number of GPUs (five, in our study) is directly related to the hardware specifications. The optimization process took 35 min using 50 PSO particles, 25 iterations and 5 GPUs.
Hagan, Aaron; Sawant, Amit; Folkerts, Michael; Modiri, Arezoo
2018-01-16
We report on the design, implementation and characterization of a multi-graphic processing unit (GPU) computational platform for higher-order optimization in radiotherapy treatment planning. In collaboration with a commercial vendor (Varian Medical Systems, Palo Alto, CA), a research prototype GPU-enabled Eclipse (V13.6) workstation was configured. The hardware consisted of dual 8-core Xeon processors, 256 GB RAM and four NVIDIA Tesla K80 general purpose GPUs. We demonstrate the utility of this platform for large radiotherapy optimization problems through the development and characterization of a parallelized particle swarm optimization (PSO) four dimensional (4D) intensity modulated radiation therapy (IMRT) technique. The PSO engine was coupled to the Eclipse treatment planning system via a vendor-provided scripting interface. Specific challenges addressed in this implementation were (i) data management and (ii) non-uniform memory access (NUMA). For the former, we alternated between parameters over which the computation process was parallelized. For the latter, we reduced the amount of data required to be transferred over the NUMA bridge. The datasets examined in this study were approximately 300 GB in size, including 4D computed tomography images, anatomical structure contours and dose deposition matrices. For evaluation, we created a 4D-IMRT treatment plan for one lung cancer patient and analyzed computation speed while varying several parameters (number of respiratory phases, GPUs, PSO particles, and data matrix sizes). The optimized 4D-IMRT plan enhanced sparing of organs at risk by an average reduction of [Formula: see text] in maximum dose, compared to the clinical optimized IMRT plan, where the internal target volume was used. We validated our computation time analyses in two additional cases. The computation speed in our implementation did not monotonically increase with the number of GPUs. The optimal number of GPUs (five, in our study) is directly related to the hardware specifications. The optimization process took 35 min using 50 PSO particles, 25 iterations and 5 GPUs.
Data Scientists ARE coming of age: but WHERE are they coming from?
NASA Astrophysics Data System (ADS)
Evans, N.; Bastrakova, I.; Connor, N.; Raymond, O.; Wyborn, L. A.
2013-12-01
The fourth paradigm of data intensive science is upon us: a new fundamental scientific methodology has emerged which is underpinned by the capability to analyse large volumes of data using advanced computational capacities. This combination is enabling earth and space scientists to respond to decadal challenges on issues such as the sustainable development of our natural resources, impacts of climate change and protection from national hazards. Fundamental to the data intensive paradigm is data that are readily accessible and capable of being integrated and amalgamated with other data often from multiple sources. For many years Earth and Space science practitioners have been drowning in a data deluge. In many cases, either lacking confidence in their capability and/or not having the time or capacity to manage these data assets they have called in the data professionals. However, such people rarely had domain knowledge of the data they were dealing with and before long it emerged that although the ';containers' of data were now much better managed and documented, in reality the content was locked up and difficult to access, particularly for HPC environments where national to global scale problems were being addressed. Geoscience Australia (GA) is the custodian of over 4 PB of Geoscientific data and is a key provider of evidence-based, scientific advice to government on national issues. Since 2011, in collaboration with CSIRO Minerals Down Under Program, and the National Computational Infrastructure, GA has begun a series of data intensive scientific research pilots that focussed on applying advanced ICT tools and technologies to enhance scientific outcomes for the agency, in particular, national scale analysis of data sets that can be up to 500 TB in size. As in any change program, a small group of innovators and early adopters took up the challenge of data intensive science and quickly showed that GA was able to use new ICT technologies to exploit an information-rich world to undertake applied research and to deliver new business outcomes in ways that current technologies do not allow. The innovators clearly had the necessary skills to rapidly adapt to data intensive techniques. However, if we were to scale out to the rest of the organisation, we needed to quantify these skills. The Strategic People Development Section of GA agreed to: * Conduct a capability analysis of the scientific staff that participated in the pilot projects including a review of university training and post graduate training; and * Conduct capability analysis of the technical groups involved in the pilot projects. The analysis identified the need for multi-disciplinary teams across the spectrum from pure scientists to pure ICT staff along with a key hybrid role - the Data Scientist, who has a greater capacity in mathematical, numerical modelling, statistics, computational skills, software engineering and spatial skills and the ability to integrate data across multiple domains. To fill the emerging gap, GA is asking the questions; how do we find or develop this capability, can we successfully transform the Scientist or the ICT Professional, are our educational facilities modifying their training - but it is certainly leading GA to acknowledge, formalise, and promote a continuum of skills and roles, changing our recruitment, re-assignment and Learning and Development strategic decisions.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Judson, Richard S.; Martin, Matthew T.; Egeghy, Peter; Gangwal, Sumit; Reif, David M.; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A.; Richard, Ann M.
2012-01-01
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases. PMID:22408426
Judson, Richard S; Martin, Matthew T; Egeghy, Peter; Gangwal, Sumit; Reif, David M; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A; Richard, Ann M
2012-01-01
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
Provenance Challenges for Earth Science Dataset Publication
NASA Technical Reports Server (NTRS)
Tilmes, Curt
2011-01-01
Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.
Interactive signal analysis and ultrasonic data collection system user's manual
NASA Technical Reports Server (NTRS)
Smith, G. R.
1978-01-01
The interactive signal analysis and ultrasonic data collection system (ECHO1) is a real time data acquisition and display system. ECHO1 executed on a PDP-11/45 computer under the RT11 real time operating system. Extensive operator interaction provided the requisite parameters to the data collection, calculation, and data modules. Data were acquired in real time from a pulse echo ultrasonic system using a Biomation Model 8100 transient recorder. The data consisted of 2084 intensity values representing the amplitude of pulses transmitted and received by the ultrasonic unit.
Phase retrieval from intensity-only data by relative entropy minimization.
Deming, Ross W
2007-11-01
A recursive algorithm, which appears to be new, is presented for estimating the amplitude and phase of a wave field from intensity-only measurements on two or more scan planes at different axial positions. The problem is framed as a nonlinear optimization, in which the angular spectrum of the complex field model is adjusted in order to minimize the relative entropy, or Kullback-Leibler divergence, between the measured and reconstructed intensities. The most common approach to this so-called phase retrieval problem is a variation of the well-known Gerchberg-Saxton algorithm devised by Misell (J. Phys. D6, L6, 1973), which is efficient and extremely simple to implement. The new algorithm has a computational structure that is very similar to Misell's approach, despite the fundamental difference in the optimization criteria used for each. Based upon results from noisy simulated data, the new algorithm appears to be more robust than Misell's approach and to produce better results from low signal-to-noise ratio data. The convergence of the new algorithm is examined.
Molecular structures and intramolecular dynamics of pentahalides
NASA Astrophysics Data System (ADS)
Ischenko, A. A.
2017-03-01
This paper reviews advances of modern gas electron diffraction (GED) method combined with high-resolution spectroscopy and quantum chemical calculations in studies of the impact of intramolecular dynamics in free molecules of pentahalides. Some recently developed approaches to the electron diffraction data interpretation, based on direct incorporation of the adiabatic potential energy surface parameters to the diffraction intensity are described. In this way, complementary data of different experimental and computational methods can be directly combined for solving problems of the molecular structure and its dynamics. The possibility to evaluate some important parameters of the adiabatic potential energy surface - barriers to pseudorotation and saddle point of intermediate configuration from diffraction intensities in solving the inverse GED problem is demonstrated on several examples. With increasing accuracy of the electron diffraction intensities and the development of the theoretical background of electron scattering and data interpretation, it has become possible to investigate complex nuclear dynamics in fluxional systems by the GED method. Results of other research groups are also included in the discussion.
NASA Astrophysics Data System (ADS)
McElroy, Kenneth L., Jr.
1992-12-01
A method is presented for the determination of neutral gas densities in the ionosphere from rocket-borne measurements of UV atmospheric emissions. Computer models were used to calculate an initial guess for the neutral atmosphere. Using this neutral atmosphere, intensity profiles for the N2 (0,5) Vegard-Kaplan band, the N2 Lyman-Birge-Hopfield band system, and the OI2972 A line were calculated and compared with the March 1990 NPS MUSTANG data. The neutral atmospheric model was modified and the intensity profiles recalculated until a fit with the data was obtained. The neutral atmosphere corresponding to the intensity profile that fit the data was assumed to be the atmospheric composition prevailing at the time of the observation. The ion densities were then calculated from the neutral atmosphere using a photochemical model. The electron density profile calculated by this model was compared with the electron density profile measured by the U.S. Air Force Geophysics Laboratory at a nearby site.
The medical science DMZ: a network design pattern for data-intensive medical science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peisert, Sean; Dart, Eli; Barnett, William
We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations.High-end networking, packet-filter firewalls, network intrusion-detection systems.We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs.The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and networkmore » resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows.By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.« less
The medical science DMZ: a network design pattern for data-intensive medical science.
Peisert, Sean; Dart, Eli; Barnett, William; Balas, Edward; Cuff, James; Grossman, Robert L; Berman, Ari; Shankar, Anurag; Tierney, Brian
2017-10-06
We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. High-end networking, packet-filter firewalls, network intrusion-detection systems. We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs. The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows. By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
NASA Astrophysics Data System (ADS)
Khalil, A. A. I.; Younis, W. O.; Gandol, M. A.
2017-03-01
We built a collinear dual-pulse laser-induced breakdown spectroscopy (DP-LIBS) system to study the aluminum (Al) plasma emission by installing a pair of Nd: YAG lasers operating at 266 and 1064 nm. The spectral intensities of selected aluminum doubly-ionized lines were employed to evaluate the optical emission spectra. The influences of the energy ratio of two pulsed lasers on the LIBS intensity for different Al doubly-ionized spectral lines were investigated. The de-excitation rate parameters of the excited ion and the electron impact excitation were computed using the analytical formulas proposed by Smeets and Vriens. The transition probabilities and energy states were computed using Hibbert's configuration interaction, computer package (CIV3). By solving the coupled rate equations including 1 s 22 s 22 p 6n s (2S), 1 s 22 s 22 p 6n p (2P), 1 s 22 s 22 p 6n d (2D) (n = 3-5) and 1 s 22 s 22 p 6n f (2F) (n = 4, 5) states, the level population densities were computed. We also proposed a theoretical population model in order to investigate the effectiveness of the various processes that might affect the population of the upper levels in Al plasma by using the rate coefficients. In addition, the population densities for the 19 upper levels were also computed. Good compatibility between the experimental and the theoretical model data had been observed. Our results might be significant as reference data for the optimization of the DP-LIBS spectrometry and diagnostics of laser produced plasmas.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)
2002-01-01
The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Fermilab computing at the Intensity Frontier
Group, Craig; Fuess, S.; Gutsche, O.; ...
2015-12-23
The Intensity Frontier refers to a diverse set of particle physics experiments using high- intensity beams. In this paper I will focus the discussion on the computing requirements and solutions of a set of neutrino and muon experiments in progress or planned to take place at the Fermi National Accelerator Laboratory located near Chicago, Illinois. In addition, the experiments face unique challenges, but also have overlapping computational needs. In principle, by exploiting the commonality and utilizing centralized computing tools and resources, requirements can be satisfied efficiently and scientists of individual experiments can focus more on the science and less onmore » the development of tools and infrastructure.« less
Parallel algorithm of VLBI software correlator under multiprocessor environment
NASA Astrophysics Data System (ADS)
Zheng, Weimin; Zhang, Dong
2007-11-01
The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Erasing the Milky Way: new cleaning technique applied to GBT intensity mapping data
NASA Astrophysics Data System (ADS)
Wolz, L.; Blake, C.; Abdalla, F. B.; Anderson, C. J.; Chang, T.-C.; Li, Y.-C.; Masui, K. W.; Switzer, E.; Pen, U.-L.; Voytek, T. C.; Yadav, J.
2017-02-01
We present the first application of a new foreground removal pipeline to the current leading H I intensity mapping data set, obtained by the Green Bank Telescope (GBT). We study the 15- and 1-h-field data of the GBT observations previously presented in Mausui et al. and Switzer et al., covering about 41 deg2 at 0.6 < z < 1.0, for which cross-correlations may be measured with the galaxy distribution of the WiggleZ Dark Energy Survey. In the presented pipeline, we subtract the Galactic foreground continuum and the point-source contamination using an independent component analysis technique (FASTICA), and develop a Fourier-based optimal estimator to compute the temperature power spectrum of the intensity maps and cross-correlation with the galaxy survey data. We show that FASTICA is a reliable tool to subtract diffuse and point-source emission through the non-Gaussian nature of their probability distributions. The temperature power spectra of the intensity maps are dominated by instrumental noise on small scales which FASTICA, as a conservative subtraction technique of non-Gaussian signals, cannot mitigate. However, we determine similar GBT-WiggleZ cross-correlation measurements to those obtained by the singular value decomposition (SVD) method, and confirm that foreground subtraction with FASTICA is robust against 21 cm signal loss, as seen by the converged amplitude of these cross-correlation measurements. We conclude that SVD and FASTICA are complementary methods to investigate the foregrounds and noise systematics present in intensity mapping data sets.
Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing.
Duan, Ling-Yu; Sun, Wei; Zhang, Xinfeng; Wang, Shiqi; Chen, Jie; Yin, Jianxiong; See, Simon; Huang, Tiejun; Kot, Alex C; Gao, Wen
2018-05-01
The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of a CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of graphics processing unit (GPU). We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation as well as the memory access mechanism are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU for resolving the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which enables to leverage the advantages of GPU platforms harmoniously, and yield significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures
NASA Astrophysics Data System (ADS)
Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.
2016-12-01
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage
Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-01-01
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.
Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-04-13
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
Causal Structure of Brain Physiology after Brain Injury from Subarachnoid Hemorrhage.
Claassen, Jan; Rahman, Shah Atiqur; Huang, Yuxiao; Frey, Hans-Peter; Schmidt, J Michael; Albers, David; Falo, Cristina Maria; Park, Soojin; Agarwal, Sachin; Connolly, E Sander; Kleinberg, Samantha
2016-01-01
High frequency physiologic data are routinely generated for intensive care patients. While massive amounts of data make it difficult for clinicians to extract meaningful signals, these data could provide insight into the state of critically ill patients and guide interventions. We develop uniquely customized computational methods to uncover the causal structure within systemic and brain physiologic measures recorded in a neurological intensive care unit after subarachnoid hemorrhage. While the data have many missing values, poor signal-to-noise ratio, and are composed from a heterogeneous patient population, our advanced imputation and causal inference techniques enable physiologic models to be learned for individuals. Our analyses confirm that complex physiologic relationships including demand and supply of oxygen underlie brain oxygen measurements and that mechanisms for brain swelling early after injury may differ from those that develop in a delayed fashion. These inference methods will enable wider use of ICU data to understand patient physiology.
The measurement of boundary layers on a compressor blade in cascade. Volume 2: Data tables
NASA Technical Reports Server (NTRS)
Zierke, William C.; Deutsch, Steven
1989-01-01
Measurements were made of the boundary layers and wakes about a highly loaded, double-circular-arc compressor blade in cascade. These laser Doppler velocimetry measurements have yielded a very detailed and precise data base with which to test the application of viscous computational codes to turbomachinery. In order to test the computational codes at off-design conditions, the data have been acquired at a chord Reynolds number of 500,000 and at three incidence angles. Average values and 95 percent confidence bands were tabularized for the velocity, local turbulence intensity, skewness, kurtosis, and percent backflow. Tables also exist for the blade static-pressure distributions and boundary layer velocity profiles reconstructed to account for the normal pressure gradient.
Enabling the High Level Synthesis of Data Analytics Accelerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave
2011-01-01
The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed.
Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy; Westwater, Dave
2011-01-01
The BT-Nurse system uses data-to-text technology to automatically generate a natural language nursing shift summary in a neonatal intensive care unit (NICU). The summary is solely based on data held in an electronic patient record system, no additional data-entry is required. BT-Nurse was tested for two months in the Royal Infirmary of Edinburgh NICU. Nurses were asked to rate the understandability, accuracy, and helpfulness of the computer-generated summaries; they were also asked for free-text comments about the summaries. The nurses found the majority of the summaries to be understandable, accurate, and helpful (p<0.001 for all measures). However, nurses also pointed out many deficiencies, especially with regard to extra content they wanted to see in the computer-generated summaries. In conclusion, natural language NICU shift summaries can be automatically generated from an electronic patient record, but our proof-of-concept software needs considerable additional development work before it can be deployed. PMID:21724739
A pen-based system to support pre-operative data collection within an anaesthesia department.
Sanz, M. F.; Gómez, E. J.; Trueba, I.; Cano, P.; Arredondo, M. T.; del Pozo, F.
1993-01-01
This paper describes the design and implementation of a pen-based computer system for remote preoperative data collection. The system is envisaged to be used by anaesthesia staff at different hospital scenarios where pre-operative data are generated. Pen-based technology offers important advantages in terms of portability and human-computer interaction, as direct manipulation interfaces by direct pointing, and "notebook user interfaces metaphors". Being the human factors analysis and user interface design a vital stage to achieve the appropriate user acceptability, a methodology that integrates the "usability" evaluation from the earlier development stages was used. Additionally, the selection of a pen-based computer system as a portable device to be used by health care personnel allows to evaluate the appropriateness of this new technology for remote data collection within the hospital environment. The work presented is currently being realised under the Research Project "TANIT: Telematics in Anaesthesia and Intensive Care", within the "A.I.M.--Telematics in Health CARE" European Research Program. PMID:8130488
Dose-response relationships using brain–computer interface technology impact stroke rehabilitation
Young, Brittany M.; Nigogosyan, Zack; Walton, Léo M.; Remsik, Alexander; Song, Jie; Nair, Veena A.; Tyler, Mitchell E.; Edwards, Dorothy F.; Caldera, Kristin; Sattin, Justin A.; Williams, Justin C.; Prabhakaran, Vivek
2015-01-01
Brain–computer interfaces (BCIs) are an emerging novel technology for stroke rehabilitation. Little is known about how dose-response relationships for BCI therapies affect brain and behavior changes. We report preliminary results on stroke patients (n = 16, 11 M) with persistent upper extremity motor impairment who received therapy using a BCI system with functional electrical stimulation of the hand and tongue stimulation. We collected MRI scans and behavioral data using the Action Research Arm Test (ARAT), 9-Hole Peg Test (9-HPT), and Stroke Impact Scale (SIS) before, during, and after the therapy period. Using anatomical and functional MRI, we computed Laterality Index (LI) for brain activity in the motor network during impaired hand finger tapping. Changes from baseline LI and behavioral scores were assessed for relationships with dose, intensity, and frequency of BCI therapy. We found that gains in SIS Strength were directly responsive to BCI therapy: therapy dose and intensity correlated positively with increased SIS Strength (p ≤ 0.05), although no direct relationships were identified with ARAT or 9-HPT scores. We found behavioral measures that were not directly sensitive to differences in BCI therapy administration but were associated with concurrent brain changes correlated with BCI therapy administration parameters: therapy dose and intensity showed significant (p ≤ 0.05) or trending (0.05 < p < 0.1) negative correlations with LI changes, while therapy frequency did not affect LI. Reductions in LI were then correlated (p ≤ 0.05) with increased SIS Activities of Daily Living scores and improved 9-HPT performance. Therefore, some behavioral changes may be reflected by brain changes sensitive to differences in BCI therapy administration, while others such as SIS Strength may be directly responsive to BCI therapy administration. Data preliminarily suggest that when using BCI in stroke rehabilitation, therapy frequency may be less important than dose and intensity. PMID:26157378
2014-01-01
Background Various computer-based methods exist for the detection and quantification of protein spots in two dimensional gel electrophoresis images. Area-based methods are commonly used for spot quantification: an area is assigned to each spot and the sum of the pixel intensities in that area, the so-called volume, is used a measure for spot signal. Other methods use the optical density, i.e. the intensity of the most intense pixel of a spot, or calculate the volume from the parameters of a fitted function. Results In this study we compare the performance of different spot quantification methods using synthetic and real data. We propose a ready-to-use algorithm for spot detection and quantification that uses fitting of two dimensional Gaussian function curves for the extraction of data from two dimensional gel electrophoresis (2-DE) images. The algorithm implements fitting using logical compounds and is computationally efficient. The applicability of the compound fitting algorithm was evaluated for various simulated data and compared with other quantification approaches. We provide evidence that even if an incorrect bell-shaped function is used, the fitting method is superior to other approaches, especially when spots overlap. Finally, we validated the method with experimental data of urea-based 2-DE of Aβ peptides andre-analyzed published data sets. Our methods showed higher precision and accuracy than other approaches when applied to exposure time series and standard gels. Conclusion Compound fitting as a quantification method for 2-DE spots shows several advantages over other approaches and could be combined with various spot detection methods. The algorithm was scripted in MATLAB (Mathworks) and is available as a supplemental file. PMID:24915860
Information content of household-stratified epidemics.
Kinyanjui, T M; Pellis, L; House, T
2016-09-01
Household structure is a key driver of many infectious diseases, as well as a natural target for interventions such as vaccination programs. Many theoretical and conceptual advances on household-stratified epidemic models are relatively recent, but have successfully managed to increase the applicability of such models to practical problems. To be of maximum realism and hence benefit, they require parameterisation from epidemiological data, and while household-stratified final size data has been the traditional source, increasingly time-series infection data from households are becoming available. This paper is concerned with the design of studies aimed at collecting time-series epidemic data in order to maximize the amount of information available to calibrate household models. A design decision involves a trade-off between the number of households to enrol and the sampling frequency. Two commonly used epidemiological study designs are considered: cross-sectional, where different households are sampled at every time point, and cohort, where the same households are followed over the course of the study period. The search for an optimal design uses Bayesian computationally intensive methods to explore the joint parameter-design space combined with the Shannon entropy of the posteriors to estimate the amount of information in each design. For the cross-sectional design, the amount of information increases with the sampling intensity, i.e., the designs with the highest number of time points have the most information. On the other hand, the cohort design often exhibits a trade-off between the number of households sampled and the intensity of follow-up. Our results broadly support the choices made in existing epidemiological data collection studies. Prospective problem-specific use of our computational methods can bring significant benefits in guiding future study designs. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Architectural Aspects of Grid Computing and its Global Prospects for E-Science Community
NASA Astrophysics Data System (ADS)
Ahmad, Mushtaq
2008-05-01
The paper reviews the imminent Architectural Aspects of Grid Computing for e-Science community for scientific research and business/commercial collaboration beyond physical boundaries. Grid Computing provides all the needed facilities; hardware, software, communication interfaces, high speed internet, safe authentication and secure environment for collaboration of research projects around the globe. It provides highly fast compute engine for those scientific and engineering research projects and business/commercial applications which are heavily compute intensive and/or require humongous amounts of data. It also makes possible the use of very advanced methodologies, simulation models, expert systems and treasure of knowledge available around the globe under the umbrella of knowledge sharing. Thus it makes possible one of the dreams of global village for the benefit of e-Science community across the globe.
3D robust Chan-Vese model for industrial computed tomography volume data segmentation
NASA Astrophysics Data System (ADS)
Liu, Linghui; Zeng, Li; Luan, Xiao
2013-11-01
Industrial computed tomography (CT) has been widely applied in many areas of non-destructive testing (NDT) and non-destructive evaluation (NDE). In practice, CT volume data to be dealt with may be corrupted by noise. This paper addresses the segmentation of noisy industrial CT volume data. Motivated by the research on the Chan-Vese (CV) model, we present a region-based active contour model that draws upon intensity information in local regions with a controllable scale. In the presence of noise, a local energy is firstly defined according to the intensity difference within a local neighborhood. Then a global energy is defined to integrate local energy with respect to all image points. In a level set formulation, this energy is represented by a variational level set function, where a surface evolution equation is derived for energy minimization. Comparative analysis with the CV model indicates the comparable performance of the 3D robust Chan-Vese (RCV) model. The quantitative evaluation also shows the segmentation accuracy of 3D RCV. In addition, the efficiency of our approach is validated under several types of noise, such as Poisson noise, Gaussian noise, salt-and-pepper noise and speckle noise.
NASA Astrophysics Data System (ADS)
Guo, Yanhui; Zhou, Chuan; Chan, Heang-Ping; Wei, Jun; Chughtai, Aamer; Sundaram, Baskaran; Hadjiiski, Lubomir M.; Patel, Smita; Kazerooni, Ella A.
2013-04-01
A 3D multiscale intensity homogeneity transformation (MIHT) method was developed to reduce false positives (FPs) in our previously developed CAD system for pulmonary embolism (PE) detection. In MIHT, the voxel intensity of a PE candidate region was transformed to an intensity homogeneity value (IHV) with respect to the local median intensity. The IHVs were calculated in multiscales (MIHVs) to measure the intensity homogeneity, taking into account vessels of different sizes and different degrees of occlusion. Seven new features including the entropy, gradient, and moments that characterized the intensity distributions of the candidate regions were derived from the MIHVs and combined with the previously designed features that described the shape and intensity of PE candidates for the training of a linear classifier to reduce the FPs. 59 CTPA PE cases were collected from our patient files (UM set) with IRB approval and 69 cases from the PIOPED II data set with access permission. 595 and 800 PEs were identified as reference standard by experienced thoracic radiologists in the UM and PIOPED set, respectively. FROC analysis was used for performance evaluation. Compared with our previous CAD system, at a test sensitivity of 80%, the new method reduced the FP rate from 18.9 to 14.1/scan for the PIOPED set when the classifier was trained with the UM set and from 22.6 to 16.0/scan vice versa. The improvement was statistically significant (p<0.05) by JAFROC analysis. This study demonstrated that the MIHT method is effective in reducing FPs and improving the performance of the CAD system.
Automating ATLAS Computing Operations using the Site Status Board
NASA Astrophysics Data System (ADS)
J, Andreeva; Iglesias C, Borrego; S, Campana; Girolamo A, Di; I, Dzhunov; Curull X, Espinal; S, Gayazov; E, Magradze; M, Nowotka M.; L, Rinaldi; P, Saiz; J, Schovancova; A, Stewart G.; M, Wright
2012-12-01
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.
Use of application containers and workflows for genomic data analysis
Schulz, Wade L.; Durant, Thomas J. S.; Siddon, Alexa J.; Torres, Richard
2016-01-01
Background: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Methods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. Results: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. Conclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia. PMID:28163975
2nd Generation QUATARA Flight Computer Project
NASA Technical Reports Server (NTRS)
Falker, Jay; Keys, Andrew; Fraticelli, Jose Molina; Capo-Iugo, Pedro; Peeples, Steven
2015-01-01
Single core flight computer boards have been designed, developed, and tested (DD&T) to be flown in small satellites for the last few years. In this project, a prototype flight computer will be designed as a distributed multi-core system containing four microprocessors running code in parallel. This flight computer will be capable of performing multiple computationally intensive tasks such as processing digital and/or analog data, controlling actuator systems, managing cameras, operating robotic manipulators and transmitting/receiving from/to a ground station. In addition, this flight computer will be designed to be fault tolerant by creating both a robust physical hardware connection and by using a software voting scheme to determine the processor's performance. This voting scheme will leverage on the work done for the Space Launch System (SLS) flight software. The prototype flight computer will be constructed with Commercial Off-The-Shelf (COTS) components which are estimated to survive for two years in a low-Earth orbit.
NASA Astrophysics Data System (ADS)
Jejčič, S.; Susino, R.; Heinzel, P.; Dzifčáková, E.; Bemporad, A.; Anzer, U.
2017-11-01
Context. We study the physics of erupting prominences in the core of coronal mass ejections (CMEs) and present a continuation of a previous analysis. Aims: We determine the kinetic temperature and microturbulent velocity of an erupting prominence embedded in the core of a CME that occurred on August 2, 2000 using the Ultraviolet Coronagraph and Spectrometer observations (UVCS) on board the Solar and Heliospheric Observatory (SOHO) simultaneously in the hydrogen Lα and C III lines. We develop the non-LTE (departures from the local thermodynamic equilibrium - LTE) spectral diagnostics based on Lα and Lβ measured integrated intensities to derive other physical quantities of the hot erupting prominence. Based on this, we synthesize the C III line intensity to compare it with observations. Methods: Our method is based on non-LTE modeling of eruptive prominences. We used a general non-LTE radiative-transfer code only for optically thin prominence points because optically thick points do not allow the direct determination of the kinetic temperature and microturbulence from the line profiles. The input parameters of the code were the kinetic temperature and microturbulent velocity derived from the Lα and C III line widths, as well as the integrated intensity of the Lα and Lβ lines. The code runs in three loops to compute the radial flow velocity, electron density, and effective thickness as the best fit to the Lα and Lβ integrated intensities within the accuracy defined by the absolute radiometric calibration of UVCS data. Results: We analyzed 39 observational points along the whole erupting prominence because for these points we found a solution for the kinetic temperature and microturbulent velocity. For these points we ran the non-LTE code to determine best-fit models. All models with τ0(Lα) ≤ 0.3 and τ0(C III) ≤ 0.3 were analyzed further, for which we computed the integrated intensity of the C III line using a two-level atom. The best agreement between computed and observed integrated intensity led to 30 optically thin points along the prominence. The results are presented as histograms of the kinetic temperature, microturbulent velocity, effective thickness, radial flow velocity, electron density, and gas pressure. We also show the relation between the microturbulence and kinetic temperature together with a scatter plot of computed versus observed C III integrated intensities and the ratio of the computed to observed C III integrated intensities versus kinetic temperature. Conclusions: The erupting prominence embedded in the CME is relatively hot with a low electron density, a wide range of effective thicknesses, a rather narrow range of radial flow velocities, and a microturbulence of about 25 km s-1. This analysis shows a disagreement between observed and synthetic intensities of the C III line, the reason for which most probably is that photoionization is neglected in calculations of the ionization equilibrium. Alternatively, the disagreement might be due to non-equilibrium processes.
NASA Astrophysics Data System (ADS)
Boldi, Robert; Williams, Earle; Guha, Anirban
2018-01-01
In this paper, we use (1) the 20 year record of Schumann resonance (SR) signals measured at West Greenwich Rhode Island, USA, (2) the 19 year Lightning Imaging Sensor (LIS)/Optical Transient Detector (OTD) lightning data, and (3) the normal mode equations for a uniform cavity model to quantify the relationship between the observed Schumann resonance modal intensity and the global-average vertical charge moment change M (C km) per lightning flash. This work, by integrating SR measurements with satellite-based optical measurements of global flash rate, accomplishes this quantification for the first time. To do this, we first fit the intensity spectra of the observed SR signals to an eight-mode, three parameter per mode, (symmetric) Lorentzian line shape model. Next, using the LIS/OTD lightning data and the normal mode equations for a uniform cavity model, we computed the expected climatological-daily-average intensity spectra. We then regressed the observed modal intensity values against the expected modal intensity values to find the best fit value of the global-average vertical charge moment change of a lightning flash (M) to be 41 C km per flash with a 99% confidence interval of ±3.9 C km per flash, independent of mode. Mode independence argues that the model adequately captured the modal intensity, the most important fit parameter herein considered. We also tested this relationship for the presence of residual modal intensity at zero lightning flashes per second and found no evidence that modal intensity is significantly different than zero at zero lightning flashes per second, setting an upper limit to the amount of nonlightning contributions to the observed modal intensity.
García-Sancho, Miguel
2012-03-01
This paper argues that the history of the computer, of the practice of computation and of the notions of 'data' and 'programme' are essential for a critical account of the emergence and implications of data-driven research. In order to show this, I focus on the transition that the investigations on the worm C. elegans experienced in the Laboratory of Molecular Biology of Cambridge (UK). Throughout the 1980s, this research programme evolved from a study of the genetic basis of the worm's development and behaviour to a DNA mapping and sequencing initiative. By examining the changing computing technologies which were used at the Laboratory, I demonstrate that by the time of this transition researchers shifted from modelling the worm's genetic programme on a mainframe apparatus to writing minicomputer programs aimed at providing map and sequence data which was then circulated to other groups working on the genetics of C. elegans. The shift in the worm research should thus not be simply explained in the application of computers which transformed the project from hypothesis-driven to a data-intensive endeavour. The key factor was rather a historically specific technology-in-house and easy programmable minicomputers-which redefined the way of achieving the project's long-standing goal, leading the genetic programme to co-evolve with the practices of data production and distribution. Copyright © 2011 Elsevier Ltd. All rights reserved.
Meng, Bo; Cong, Wenxiang; Xi, Yan; De Man, Bruno; Yang, Jian; Wang, Ge
2017-01-01
Contrast-enhanced computed tomography (CECT) helps enhance the visibility for tumor imaging. When a high-Z contrast agent interacts with X-rays across its K-edge, X-ray photoelectric absorption would experience a sudden increment, resulting in a significant difference of the X-ray transmission intensity between the left and right energy windows of the K-edge. Using photon-counting detectors, the X-ray intensity data in the left and right windows of the K-edge can be measured simultaneously. The differential information of the two kinds of intensity data reflects the contrast-agent concentration distribution. K-edge differences between various matters allow opportunities for the identification of contrast agents in biomedical applications. In this paper, a general radon transform is established to link the contrast-agent concentration to X-ray intensity measurement data. An iterative algorithm is proposed to reconstruct a contrast-agent distribution and tissue attenuation background simultaneously. Comprehensive numerical simulations are performed to demonstrate the merits of the proposed method over the existing K-edge imaging methods. Our results show that the proposed method accurately quantifies a distribution of a contrast agent, optimizing the contrast-to-noise ratio at a high dose efficiency. PMID:28437900
Laser Signature Prediction Using The VALUE Computer Program
NASA Astrophysics Data System (ADS)
Akerman, Alexander; Hoffman, George A.; Patton, Ronald
1989-09-01
A variety of enhancements are being made to the 1976-vintage LASERX computer code. These include: - Surface characterization with BDRF tabular data - Specular reflection from transparent surfaces - Generation of glint direction maps - Generation of relative range imagery - Interface to the LOWTRAN atmospheric transmission code - Interface to the LEOPS laser sensor code - User friendly menu prompting for easy setup Versions of VALUE have been written for both VAX/VMS and PC/DOS computer environments. Outputs have also been revised to be user friendly and include tables, plots, and images for (1) intensity, (2) cross section,(3) reflectance, (4) relative range, (5) region type, and (6) silhouette.
NASA Astrophysics Data System (ADS)
Glushkov, A. V.; Gurskaya, M. Yu; Ignatenko, A. V.; Smirnov, A. V.; Serga, I. N.; Svinarenko, A. A.; Ternovsky, E. V.
2017-10-01
The consistent relativistic energy approach to the finite Fermi-systems (atoms and nuclei) in a strong realistic laser field is presented and applied to computing the multiphoton resonances parameters in some atoms and nuclei. The approach is based on the Gell-Mann and Low S-matrix formalism, multiphoton resonance lines moments technique and advanced Ivanov-Ivanova algorithm of calculating the Green’s function of the Dirac equation. The data for multiphoton resonance width and shift for the Cs atom and the 57Fe nucleus in dependence upon the laser intensity are listed.
NASA Astrophysics Data System (ADS)
Purss, Matthew; Lewis, Adam; Edberg, Roger; Ip, Alex; Sixsmith, Joshua; Frankish, Glenn; Chan, Tai; Evans, Ben; Hurst, Lachlan
2013-04-01
Australia's Earth Observation Program has downlinked and archived satellite data acquired under the NASA Landsat mission for the Australian Government since the establishment of the Australian Landsat Station in 1979. Geoscience Australia maintains this archive and produces image products to aid the delivery of government policy objectives. Due to the labor intensive nature of processing of this data there have been few national-scale datasets created to date. To compile any Earth Observation product the historical approach has been to select the required subset of data and process "scene by scene" on an as-needed basis. As data volumes have increased over time, and the demand for the processed data has also grown, it has become increasingly difficult to rapidly produce these products and achieve satisfactory policy outcomes using these historic processing methods. The result is that we have been "drowning in a sea of uncalibrated data" and scientists, policy makers and the public have not been able to realize the full potential of the Australian Landsat Archive and its value is therefore significantly diminished. To overcome this critical issue, the Australian Space Research Program has funded the "Unlocking the Landsat Archive" (ULA) Project from April 2011 to June 2013 to improve the access and utilization of Australia's archive of Landsat data. The ULA Project is a public-private consortium led by Lockheed Martin Australia (LMA) and involving Geoscience Australia (GA), the Victorian Partnership for Advanced Computing (VPAC), the National Computational Infrastructure (NCI) at the Australian National University (ANU) and the Cooperative Research Centre for Spatial Information (CRC-SI). The outputs from the ULA project will become a fundamental component of Australia's eResearch infrastructure, with the Australian Landsat Archive hosted on the NCI and made openly available under a creative commons license. NCI provides access to researchers through significant HPC supercomputers, cloud infrastructure and data resources along with a large catalogue of software tools that make it possible to fully explore the potential of this data. Under the ULA Project, Geoscience Australia has developed a data-intensive processing workflow on the NCI. This system has allowed us to successfully process 11 years of the Australian Landsat Archive (from 2000 to 2010 inclusive) to standardized well-calibrated and sensor independent data products at a rate that allows for both bulk data processing of the archive and near-realtime processing of newly acquired satellite data. These products are available as Optical Surface Reflectance 25m (OSR25) and other derived products, such as Fractional Cover.
Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Recent evolution of the offline computing model of the NOvA experiment
Habig, Alec; Norman, A.; Group, Craig
2015-12-23
The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study ν e appearance in a ν μ beam. Over the last few years there has been intense work to streamline the computing infrastructure in preparation for data, which started to flow in from the far detector in Fall 2013. Major accomplishments for this effort include migration to the use of off-site resources through the use of the Open Science Grid and upgrading the file-handling framework from simple disk storage to a tiered system using a comprehensive data management and delivery system to find and access files onmore » either disk or tape storage. NOvA has already produced more than 6.5 million files and more than 1 PB of raw data and Monte Carlo simulation files which are managed under this model. In addition, the current system has demonstrated sustained rates of up to 1 TB/hour of file transfer by the data handling system. NOvA pioneered the use of new tools and this paved the way for their use by other Intensity Frontier experiments at Fermilab. Most importantly, the new framework places the experiment's infrastructure on a firm foundation, and is ready to produce the files needed for first physics.« less
Summary: Special Session SpS15: Data Intensive Astronomy
NASA Astrophysics Data System (ADS)
Montmerle, Thierry
2015-03-01
A new paradigm in astronomical research has been emerging - ``Data Intensive Astronomy'' that utilizes large amounts of data combined with statistical data analyses. The first research method in astronomy was observations by our eyes. It is well known that the invention of telescope impacted the human view on our Universe (although it was almost limited to the solar system), and lead to Keplerfs law that was later used by Newton to derive his mechanics. Newtonian mechanics then enabled astronomers to provide the theoretical explanation to the motion of the planets. Thus astronomers obtained the second paradigm, theoretical astronomy. Astronomers succeeded to apply various laws of physics to reconcile phenomena in the Universe; e.g., nuclear fusion was found to be the energy source of a star. Theoretical astronomy has been paired with observational astronomy to better understand the background physics in observed phenomena in the Universe. Although theoretical astronomy succeeded to provide good physical explanations qualitatively, it was not easy to have quantitative agreements with observations in the Universe. Since the invention of high-performance computers, however, astronomers succeeded to have the third research method, simulations, to get better agreements with observations. Simulation astronomy developed so rapidly along with the development of computer hardware (CPUs, GPUs, memories, storage systems, networks, and others) and simulation codes.
Recent Evolution of the Offline Computing Model of the NOvA Experiment
NASA Astrophysics Data System (ADS)
Habig, Alec; Norman, A.
2015-12-01
The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study νe appearance in a νμ beam. Over the last few years there has been intense work to streamline the computing infrastructure in preparation for data, which started to flow in from the far detector in Fall 2013. Major accomplishments for this effort include migration to the use of off-site resources through the use of the Open Science Grid and upgrading the file-handling framework from simple disk storage to a tiered system using a comprehensive data management and delivery system to find and access files on either disk or tape storage. NOvA has already produced more than 6.5 million files and more than 1 PB of raw data and Monte Carlo simulation files which are managed under this model. The current system has demonstrated sustained rates of up to 1 TB/hour of file transfer by the data handling system. NOvA pioneered the use of new tools and this paved the way for their use by other Intensity Frontier experiments at Fermilab. Most importantly, the new framework places the experiment's infrastructure on a firm foundation, and is ready to produce the files needed for first physics.
Approximate Bayesian Computation in the estimation of the parameters of the Forbush decrease model
NASA Astrophysics Data System (ADS)
Wawrzynczak, A.; Kopka, P.
2017-12-01
Realistic modeling of the complicated phenomena as Forbush decrease of the galactic cosmic ray intensity is a quite challenging task. One aspect is a numerical solution of the Fokker-Planck equation in five-dimensional space (three spatial variables, the time and particles energy). The second difficulty arises from a lack of detailed knowledge about the spatial and time profiles of the parameters responsible for the creation of the Forbush decrease. Among these parameters, the central role plays a diffusion coefficient. Assessment of the correctness of the proposed model can be done only by comparison of the model output with the experimental observations of the galactic cosmic ray intensity. We apply the Approximate Bayesian Computation (ABC) methodology to match the Forbush decrease model to experimental data. The ABC method is becoming increasing exploited for dynamic complex problems in which the likelihood function is costly to compute. The main idea of all ABC methods is to accept samples as an approximate posterior draw if its associated modeled data are close enough to the observed one. In this paper, we present application of the Sequential Monte Carlo Approximate Bayesian Computation algorithm scanning the space of the diffusion coefficient parameters. The proposed algorithm is adopted to create the model of the Forbush decrease observed by the neutron monitors at the Earth in March 2002. The model of the Forbush decrease is based on the stochastic approach to the solution of the Fokker-Planck equation.
Kinetic energy budgets in areas of convection
NASA Technical Reports Server (NTRS)
Fuelberg, H. E.
1979-01-01
Synoptic scale budgets of kinetic energy are computed using 3 and 6 h data from three of NASA's Atmospheric Variability Experiments (AVE's). Numerous areas of intense convection occurred during the three experiments. Large kinetic energy variability, with periods as short as 6 h, is observed in budgets computed over each entire experiment area and over limited volumes that barely enclose the convection and move with it. Kinetic energy generation and transport processes in the smaller volumes are often a maximum when the enclosed storms are near peak intensity, but the nature of the various energy processes differs between storm cases and seems closely related to the synoptic conditions. A commonly observed energy budget for peak storm intensity indicates that generation of kinetic energy by cross-contour flow is the major energy source while dissipation to subgrid scales is the major sink. Synoptic scale vertical motion transports kinetic energy from lower to upper levels of the atmosphere while low-level horizontal flux convergence and upper-level horizontal divergence also occur. Spatial fields of the energy budget terms show that the storm environment is a major center of energy activity for the entire area.
NASA Astrophysics Data System (ADS)
Wu, Yuanfeng; Gao, Lianru; Zhang, Bing; Zhao, Haina; Li, Jun
2014-01-01
We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.
Laboratory manual: mineral X-ray diffraction data retrieval/plot computer program
Hauff, Phoebe L.; VanTrump, George
1976-01-01
The Mineral X-Ray Diffraction Data Retrieval/Plot Computer Program--XRDPLT (VanTrump and Hauff, 1976a) is used to retrieve and plot mineral X-ray diffraction data. The program operates on a file of mineral powder diffraction data (VanTrump and Hauff, 1976b) which contains two-theta or 'd' values, and intensities, chemical formula, mineral name, identification number, and mineral group code. XRDPLT is a machine-independent Fortran program which operates in time-sharing mode on a DEC System i0 computer and the Gerber plotter (Evenden, 1974). The program prompts the user to respond from a time-sharing terminal in a conversational format with the required input information. The program offers two major options: retrieval only; retrieval and plot. The first option retrieves mineral names, formulas, and groups from the file by identification number, by the mineral group code (a classification by chemistry or structure), or by searches based on the formula components. For example, it enables the user to search for minerals by major groups (i.e., feldspars, micas, amphiboles, oxides, phosphates, carbonates) by elemental composition (i.e., Fe, Cu, AI, Zn), or by a combination of these (i.e., all copper-bearing arsenates). The second option retrieves as the first, but also plots the retrieved 2-theta and intensity values as diagrammatic X-ray powder patterns on mylar sheets or overlays. These plots can be made using scale combinations compatible with chart recorder diffractograms and 114.59 mm powder camera films. The overlays are then used to separate or sieve out unrelated minerals until unknowns are matched and identified.
Vision-Based UAV Flight Control and Obstacle Avoidance
2006-01-01
denoted it by Vb = (Vb1, Vb2 , Vb3). Fig. 2 shows the block diagram of the proposed vision-based motion analysis and obstacle avoidance system. We denote...structure analysis often involve computation- intensive computer vision tasks, such as feature extraction and geometric modeling. Computation-intensive...First, we extract a set of features from each block. 2) Second, we compute the distance between these two sets of features. In conventional motion
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-01-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today’s data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG’s simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact. PMID:27390389
Plasma wave excitation by intense microwave transmission from a space vehicle
NASA Astrophysics Data System (ADS)
Kimura, I.; Matsumoto, H.; Kaya, N.; Miyatake, S.
An impact of intense microwave upon the ionospheric plasma was empirically investigated by an active rocket experiment (MINIX). The rocket carried two high-power (830W) transmitters of 2.45 GHz microwave on the mother section of the rocket. The ionospheric plasma response to the intense microwave was measured by a diagnostic package installed on both mother and daughter sections. The daughter section was separated from the mother with a slow speed of 15 cm/sec. The plasma wave analyzers revealed that various plasma waves are nonlinearly excited by the microwave. Among them, the most intense are electron cyclotron waves, followed by electron plasma waves. Extremely low frequency waves (several tens of Hz) are also found. The results of the data analysis as well as comparative computer simulations are given in this paper.
Quantitative Assay for Starch by Colorimetry Using a Desktop Scanner
ERIC Educational Resources Information Center
Matthews, Kurt R.; Landmark, James D.; Stickle, Douglas F.
2004-01-01
The procedure to produce standard curve for starch concentration measurement by image analysis using a color scanner and computer for data acquisition and color analysis is described. Color analysis is performed by a Visual Basic program that measures red, green, and blue (RGB) color intensities for pixels within the scanner image.
van Der Laak, J A; Pahlplatz, M M; Hanselaar, A G; de Wilde, P C
2000-04-01
Transmitted light microscopy is used in pathology to examine stained tissues. Digital image analysis is gaining importance as a means to quantify alterations in tissues. A prerequisite for accurate and reproducible quantification is the possibility to recognise stains in a standardised manner, independently of variations in the staining density. The usefulness of three colour models was studied using data from computer simulations and experimental data from an immuno-doublestained tissue section. Direct use of the three intensities obtained by a colour camera results in the red-green-blue (RGB) model. By decoupling the intensity from the RGB data, the hue-saturation-intensity (HSI) model is obtained. However, the major part of the variation in perceived intensities in transmitted light microscopy is caused by variations in staining density. Therefore, the hue-saturation-density (HSD) transform was defined as the RGB to HSI transform, applied to optical density values rather than intensities for the individual RGB channels. In the RGB model, the mixture of chromatic and intensity information hampers standardisation of stain recognition. In the HSI model, mixtures of stains that could be distinguished from other stains in the RGB model could not be separated. The HSD model enabled all possible distinctions in a two-dimensional, standardised data space. In the RGB model, standardised recognition is only possible by using complex and time-consuming algorithms. The HSI model is not suitable for stain recognition in transmitted light microscopy. The newly derived HSD model was found superior to the existing models for this purpose. Copyright 2000 Wiley-Liss, Inc.
NASA Technical Reports Server (NTRS)
Flanagan, P. M.; Atherton, W. J.
1985-01-01
A robotic system to automate the detection, location, and quantification of gear noise using acoustic intensity measurement techniques has been successfully developed. Major system components fabricated under this grant include an instrumentation robot arm, a robot digital control unit and system software. A commercial, desktop computer, spectrum analyzer and two microphone probe complete the equipment required for the Robotic Acoustic Intensity Measurement System (RAIMS). Large-scale acoustic studies of gear noise in helicopter transmissions cannot be performed accurately and reliably using presently available instrumentation and techniques. Operator safety is a major concern in certain gear noise studies due to the operating environment. The man-hours needed to document a noise field in situ is another shortcoming of present techniques. RAIMS was designed to reduce the labor and hazard in collecting data and to improve the accuracy and repeatability of characterizing the acoustic field by automating the measurement process. Using RAIMS a system operator can remotely control the instrumentation robot to scan surface areas and volumes generating acoustic intensity information using the two microphone technique. Acoustic intensity studies requiring hours of scan time can be performed automatically without operator assistance. During a scan sequence, the acoustic intensity probe is positioned by the robot and acoustic intensity data is collected, processed, and stored.
NASA Astrophysics Data System (ADS)
Evans, B. J. K.; Foster, C.; Minchin, S. A.; Pugh, T.; Lewis, A.; Wyborn, L. A.; Evans, B. J.; Uhlherr, A.
2014-12-01
The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections. NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.
Blood vessel-based liver segmentation through the portal phase of a CT dataset
NASA Astrophysics Data System (ADS)
Maklad, Ahmed S.; Matsuhiro, Mikio; Suzuki, Hidenobu; Kawata, Yoshiki; Niki, Noboru; Moriyama, Noriyuki; Utsunomiya, Toru; Shimada, Mitsuo
2013-02-01
Blood vessels are dispersed throughout the human body organs and carry unique information for each person. This information can be used to delineate organ boundaries. The proposed method relies on abdominal blood vessels (ABV) to segment the liver considering the potential presence of tumors through the portal phase of a CT dataset. ABV are extracted and classified into hepatic (HBV) and nonhepatic (non-HBV) with a small number of interactions. HBV and non-HBV are used to guide an automatic segmentation of the liver. HBV are used to individually segment the core region of the liver. This region and non-HBV are used to construct a boundary surface between the liver and other organs to separate them. The core region is classified based on extracted posterior distributions of its histogram into low intensity tumor (LIT) and non-LIT core regions. Non-LIT case includes normal part of liver, HBV, and high intensity tumors if exist. Each core region is extended based on its corresponding posterior distribution. Extension is completed when it reaches either a variation in intensity or the constructed boundary surface. The method was applied to 80 datasets (30 Medical Image Computing and Computer Assisted Intervention (MICCAI) and 50 non-MICCAI data) including 60 datasets with tumors. Our results for the MICCAI-test data were evaluated by sliver07 [1] with an overall score of 79.7, which ranks seventh best on the site (December 2013). This approach seems a promising method for extraction of liver volumetry of various shapes and sizes and low intensity hepatic tumors.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...
2017-01-28
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Insomnia symptoms among Greek adolescent students with excessive computer use
Siomos, K E; Braimiotis, D; Floros, G D; Dafoulis, V; Angelopoulos, N V
2010-01-01
Background: The aim of the present study is to assess the intensity of computer use and insomnia epidemiology among Greek adolescents, to examine any possible age and gender differences and to investigate whether excessive computer use is a risk factor for developing insomnia symptoms. Patients and Methods: Cross-sectional study of a stratified sample of 2195 high school students. Demographic data were recorded and two specific questionnaires were used, the Adolescent Computer Addiction Test (ACAT) and the Athens Insomnia Scale (AIS). Results: Females scored higher than males on insomnia complaints but lower on computer use and addiction. A dosemediated effect of computer use on insomnia complaints was recorded. Computer use had a larger effect size than sex on insomnia complaints. Duration of computer use was longer for those adolescents classified as suffering from insomnia compared to those who were not. Conclusions: Computer use can be a significant cause of insomnia complaints in an adolescent population regardless of whether the individual is classified as addicted or not. PMID:20981171
Anisotropic scattering of discrete particle arrays.
Paul, Joseph S; Fu, Wai Chong; Dokos, Socrates; Box, Michael
2010-05-01
Far-field intensities of light scattered from a linear centro-symmetric array illuminated by a plane wave of incident light are estimated at a series of detector angles. The intensities are computed from the superposition of E-fields scattered by the individual array elements. An average scattering phase function is used to model the scattered fields of individual array elements. The nature of scattering from the array is investigated using an image (theta-phi plot) of the far-field intensities computed at a series of locations obtained by rotating the detector angle from 0 degrees to 360 degrees, corresponding to each angle of incidence in the interval [0 degrees 360 degrees]. The diffraction patterns observed from the theta-Phi plot are compared with those for isotropic scattering. In the absence of prior information on the array geometry, the intensities corresponding to theta-Phi pairs satisfying the Bragg condition are used to estimate the phase function. An algorithmic procedure is presented for this purpose and tested using synthetic data. The relative error between estimated and theoretical values of the phase function is shown to be determined by the mean spacing factor, the number of elements, and the far-field distance. An empirical relationship is presented to calculate the optimal far-field distance for a given specification of the percentage error.
Nesvizhskii, Alexey I.
2013-01-01
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false positive protein interactions present in unfiltered datasets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome-wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS datasets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data. PMID:22611043
Automatic registration of optical imagery with 3d lidar data using local combined mutual information
NASA Astrophysics Data System (ADS)
Parmehr, E. G.; Fraser, C. S.; Zhang, C.; Leach, J.
2013-10-01
Automatic registration of multi-sensor data is a basic step in data fusion for photogrammetric and remote sensing applications. The effectiveness of intensity-based methods such as Mutual Information (MI) for automated registration of multi-sensor image has been previously reported for medical and remote sensing applications. In this paper, a new multivariable MI approach that exploits complementary information of inherently registered LiDAR DSM and intensity data to improve the robustness of registering optical imagery and LiDAR point cloud, is presented. LiDAR DSM and intensity information has been utilised in measuring the similarity of LiDAR and optical imagery via the Combined MI. An effective histogramming technique is adopted to facilitate estimation of a 3D probability density function (pdf). In addition, a local similarity measure is introduced to decrease the complexity of optimisation at higher dimensions and computation cost. Therefore, the reliability of registration is improved due to the use of redundant observations of similarity. The performance of the proposed method for registration of satellite and aerial images with LiDAR data in urban and rural areas is experimentally evaluated and the results obtained are discussed.
Parallel In Situ Indexing for Data-intensive Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jinoh; Abbasi, Hasan; Chacon, Luis
2011-09-09
As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increasemore » in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings.« less
Silberstein, M.; Tzemach, A.; Dovgolevsky, N.; Fishelson, M.; Schuster, A.; Geiger, D.
2006-01-01
Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We present a distributed system called “SUPERLINK-ONLINE,” for the computation of multipoint LOD scores of large inbred pedigrees. It achieves high performance via the efficient parallelization of the algorithms in SUPERLINK, a state-of-the-art serial program for these tasks, and through the use of the idle cycles of thousands of personal computers. The main algorithmic challenge has been to efficiently split a large task for distributed execution in a highly dynamic, nondedicated running environment. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either the installation of software or the maintenance of a complicated distributed environment. As the system was being developed, it was extensively tested by collaborating medical centers worldwide on a variety of real data sets, some of which are presented in this article. PMID:16685644
NASA Astrophysics Data System (ADS)
Wyborn, L. A.; Evans, B. J. K.
2015-12-01
The National Computational Infrastructure (NCI) at the Australian National University (ANU) has evolved to become Australia's peak computing centre for national computational and Data-intensive Earth system science. More recently NCI collocated 10 Petabytes of 34 major national and international environmental, climate, earth system, geophysics and astronomy data collections to create the National Environmental Research Interoperability Data Platform (NERDIP). Spatial scales of the collections range from global to local ultra-high resolution, whilst sizes range from 3PB down to a few GB. The data is highly connected to both NCI HPC and cloud resources via low latency internal networks with massive bandwidth. Now that the collections are collocated on a single data platform, the 'Hype' and expectations around potential use cases for the NERDIP are high. Not unexpected issues are emerging such as access, licensing issues, ownership, and incompatible data standards. Many communities are standardised within their domain, but achieving true interdisciplinary science will require all communities to move towards open interoperable data formats such as NetCDF4/HDF5. This transition will impact on software using proprietary or non-open standards. But before we reach the 'Plateau of Productivity', there needs to be greater 'Enlightenment' of users to encourage them to realise that this unprecedented Earth system science platform provides a rich mine of opportunities for discovery and innovation for a diverse range of both domain-specific and interdisciplinary investigations including climate and weather research, impact analysis, environment, remote sensing and geophysics and develop new and innovative interdisciplinary use cases that will guide those architecting the system and help minimise the amplitude of the 'Trough of Disillusionment' and ensure greater productivity and uptake of the collections that make NERDIP unique in the next generation of Data-intensive Science.
Smart, Luke R; Mangat, Halinder S; Issarow, Benson; McClelland, Paul; Mayaya, Gerald; Kanumba, Emmanuel; Gerber, Linda M; Wu, Xian; Peck, Robert N; Ngayomela, Isidore; Fakhar, Malik; Stieg, Philip E; Härtl, Roger
2017-09-01
Severe traumatic brain injury (TBI) is a major cause of death and disability worldwide. Prospective TBI data from sub-Saharan Africa are sparse. This study examines epidemiology and explores management of patients with severe TBI and adherence to Brain Trauma Foundation Guidelines at a tertiary care referral hospital in Tanzania. Patients with severe TBI hospitalized at Bugando Medical Centre were recorded in a prospective registry including epidemiologic, clinical, treatment, and outcome data. Between September 2013 and October 2015, 371 patients with TBI were admitted; 33% (115/371) had severe TBI. Mean age was 32.0 years ± 20.1, and most patients were male (80.0%). Vehicular injuries were the most common cause of injury (65.2%). Approximately half of the patients (47.8%) were hospitalized on the day of injury. Computed tomography of the brain was performed in 49.6% of patients, and 58.3% were admitted to the intensive care unit. Continuous arterial blood pressure monitoring and intracranial pressure monitoring were not performed in any patient. Of patients with severe TBI, 38.3% received hyperosmolar therapy, and 35.7% underwent craniotomy. The 2-week mortality was 34.8%. Mortality of patients with severe TBI at Bugando Medical Centre, Tanzania, is approximately twice that in high-income countries. Intensive care unit care, computed tomography imaging, and continuous arterial blood pressure and intracranial pressure monitoring are underused or unavailable in the tertiary referral hospital setting. Improving outcomes after severe TBI will require concerted investment in prehospital care and improvement in availability of intensive care unit resources, computed tomography, and expertise in multidisciplinary care. Copyright © 2017 Elsevier Inc. All rights reserved.
Semivariogram Analysis of Bone Images Implemented on FPGA Architectures.
Shirvaikar, Mukul; Lagadapati, Yamuna; Dong, Xuanliang
2017-03-01
Osteoporotic fractures are a major concern for the healthcare of elderly and female populations. Early diagnosis of patients with a high risk of osteoporotic fractures can be enhanced by introducing second-order statistical analysis of bone image data using techniques such as variogram analysis. Such analysis is computationally intensive thereby creating an impediment for introduction into imaging machines found in common clinical settings. This paper investigates the fast implementation of the semivariogram algorithm, which has been proven to be effective in modeling bone strength, and should be of interest to readers in the areas of computer-aided diagnosis and quantitative image analysis. The semivariogram is a statistical measure of the spatial distribution of data, and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. A semi-variance, γ ( h ), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h . Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O ( n 2 ) Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from DXA scans are utilized for the experiments. Implementation results show that a significant advantage in computational speed is attained by the architectures with respect to implementation on a personal computer with an Intel i7 multi-core processor.
Semivariogram Analysis of Bone Images Implemented on FPGA Architectures
Shirvaikar, Mukul; Lagadapati, Yamuna; Dong, Xuanliang
2016-01-01
Osteoporotic fractures are a major concern for the healthcare of elderly and female populations. Early diagnosis of patients with a high risk of osteoporotic fractures can be enhanced by introducing second-order statistical analysis of bone image data using techniques such as variogram analysis. Such analysis is computationally intensive thereby creating an impediment for introduction into imaging machines found in common clinical settings. This paper investigates the fast implementation of the semivariogram algorithm, which has been proven to be effective in modeling bone strength, and should be of interest to readers in the areas of computer-aided diagnosis and quantitative image analysis. The semivariogram is a statistical measure of the spatial distribution of data, and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O (n2) Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from DXA scans are utilized for the experiments. Implementation results show that a significant advantage in computational speed is attained by the architectures with respect to implementation on a personal computer with an Intel i7 multi-core processor. PMID:28428829
Grid-Enabled High Energy Physics Research using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Mahmood, Akhtar
2005-04-01
At Edinboro University of Pennsylvania, we have built a 8-node 25 Gflops Beowulf Cluster with 2.5 TB of disk storage space to carry out grid-enabled, data-intensive high energy physics research for the ATLAS experiment via Grid3. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes. Once fully functional, the Cluster will be part of Grid3[www.ivdgl.org/grid3]. The current ATLAS simulation grid application, models the entire physical processes from the proton anti-proton collisions and detector's response to the collision debri through the complete reconstruction of the event from analyses of these responses. The end result is a detailed set of data that simulates the real physical collision event inside a particle detector. Grid is the new IT infrastructure for the 21^st century science -- a new computing paradigm that is poised to transform the practice of large-scale data-intensive research in science and engineering. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
Orchestrating Distributed Resource Ensembles for Petascale Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baldin, Ilya; Mandal, Anirban; Ruth, Paul
2014-04-24
Distributed, data-intensive computational science applications of interest to DOE scientific com- munities move large amounts of data for experiment data management, distributed analysis steps, remote visualization, and accessing scientific instruments. These applications need to orchestrate ensembles of resources from multiple resource pools and interconnect them with high-capacity multi- layered networks across multiple domains. It is highly desirable that mechanisms are designed that provide this type of resource provisioning capability to a broad class of applications. It is also important to have coherent monitoring capabilities for such complex distributed environments. In this project, we addressed these problems by designing an abstractmore » API, enabled by novel semantic resource descriptions, for provisioning complex and heterogeneous resources from multiple providers using their native provisioning mechanisms and control planes: computational, storage, and multi-layered high-speed network domains. We used an extensible resource representation based on semantic web technologies to afford maximum flexibility to applications in specifying their needs. We evaluated the effectiveness of provisioning using representative data-intensive ap- plications. We also developed mechanisms for providing feedback about resource performance to the application, to enable closed-loop feedback control and dynamic adjustments to resource allo- cations (elasticity). This was enabled through development of a novel persistent query framework that consumes disparate sources of monitoring data, including perfSONAR, and provides scalable distribution of asynchronous notifications.« less
NASA Astrophysics Data System (ADS)
Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.
2015-12-01
The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
Seismic waveform modeling over cloud
NASA Astrophysics Data System (ADS)
Luo, Cong; Friederich, Wolfgang
2016-04-01
With the fast growing computational technologies, numerical simulation of seismic wave propagation achieved huge successes. Obtaining the synthetic waveforms through numerical simulation receives an increasing amount of attention from seismologists. However, computational seismology is a data-intensive research field, and the numerical packages usually come with a steep learning curve. Users are expected to master considerable amount of computer knowledge and data processing skills. Training users to use the numerical packages, correctly access and utilize the computational resources is a troubled task. In addition to that, accessing to HPC is also a common difficulty for many users. To solve these problems, a cloud based solution dedicated on shallow seismic waveform modeling has been developed with the state-of-the-art web technologies. It is a web platform integrating both software and hardware with multilayer architecture: a well designed SQL database serves as the data layer, HPC and dedicated pipeline for it is the business layer. Through this platform, users will no longer need to compile and manipulate various packages on the local machine within local network to perform a simulation. By providing users professional access to the computational code through its interfaces and delivering our computational resources to the users over cloud, users can customize the simulation at expert-level, submit and run the job through it.
The QuakeSim Project: Numerical Simulations for Active Tectonic Processes
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay; Lyzenga, Greg; Granat, Robert; Fox, Geoffrey; Pierce, Marlon; Rundle, John; McLeod, Dennis; Grant, Lisa; Tullis, Terry
2004-01-01
In order to develop a solid earth science framework for understanding and studying of active tectonic and earthquake processes, this task develops simulation and analysis tools to study the physics of earthquakes using state-of-the art modeling, data manipulation, and pattern recognition technologies. We develop clearly defined accessible data formats and code protocols as inputs to the simulations. these are adapted to high-performance computers because the solid earth system is extremely complex and nonlinear resulting in computationally intensive problems with millions of unknowns. With these tools it will be possible to construct the more complex models and simulations necessary to develop hazard assessment systems critical for reducing future losses from major earthquakes.
clearScience: Infrastructure for Communicating Data-Intensive Science.
Bot, Brian M; Burdick, David; Kellen, Michael; Huang, Erich S
2013-01-01
Progress in biomedical research requires effective scientific communication to one's peers and to the public. Current research routinely encompasses large datasets and complex analytic processes, and the constraints of traditional journal formats limit useful transmission of these elements. We are constructing a framework through which authors can not only provide the narrative of what was done, but the primary and derivative data, the source code, the compute environment, and web-accessible virtual machines. This infrastructure allows authors to "hand their machine"- prepopulated with libraries, data, and code-to those interested in reviewing or building off of their work. This project, "clearScience," seeks to provide an integrated system that accommodates the ad hoc nature of discovery in the data-intensive sciences and seamless transitions from working to reporting. We demonstrate that rather than merely describing the science being reported, one can deliver the science itself.
NASA Astrophysics Data System (ADS)
Malik, T.; Foster, I.; Goodall, J. L.; Peckham, S. D.; Baker, J. B. H.; Gurnis, M.
2015-12-01
Research activities are iterative, collaborative, and now data- and compute-intensive. Such research activities mean that even the many researchers who work in small laboratories must often create, acquire, manage, and manipulate much diverse data and keep track of complex software. They face difficult data and software management challenges, and data sharing and reproducibility are neglected. There is signficant federal investment in powerful cyberinfrastructure, in part to lesson the burden associated with modern data- and compute-intensive research. Similarly, geoscience communities are establishing research repositories to facilitate data preservation. Yet we observe a large fraction of the geoscience community continues to struggle with data and software management. The reason, studies suggest, is not lack of awareness but rather that tools do not adequately support time-consuming data life cycle activities. Through NSF/EarthCube-funded GeoDataspace project, we are building personalized, shareable dataspaces that help scientists connect their individual or research group efforts with the community at large. The dataspaces provide a light-weight multiplatform research data management system with tools for recording research activities in what we call geounits, so that a geoscientist can at any time snapshot and preserve, both for their own use and to share with the community, all data and code required to understand and reproduce a study. A software-as-a-service (SaaS) deployment model enhances usability of core components, and integration with widely used software systems. In this talk we will present the open-source GeoDataspace project and demonstrate how it is enabling reproducibility across geoscience domains of hydrology, space science, and modeling toolkits.
The M-Integral for Computing Stress Intensity Factors in Generally Anisotropic Materials
NASA Technical Reports Server (NTRS)
Warzynek, P. A.; Carter, B. J.; Banks-Sills, L.
2005-01-01
The objective of this project is to develop and demonstrate a capability for computing stress intensity factors in generally anisotropic materials. These objectives have been met. The primary deliverable of this project is this report and the information it contains. In addition, we have delivered the source code for a subroutine that will compute stress intensity factors for anisotropic materials encoded in both the C and Python programming languages and made available a version of the FRANC3D program that incorporates this subroutine. Single crystal super alloys are commonly used for components in the hot sections of contemporary jet and rocket engines. Because these components have a uniform atomic lattice orientation throughout, they exhibit anisotropic material behavior. This means that stress intensity solutions developed for isotropic materials are not appropriate for the analysis of crack growth in these materials. Until now, a general numerical technique did not exist for computing stress intensity factors of cracks in anisotropic materials and cubic materials in particular. Such a capability was developed during the project and is described and demonstrated herein.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzke, Melissa M.; Brown, Joseph N.; Gritsenko, Marina A.
2013-02-01
Liquid chromatography coupled with mass spectrometry (LC-MS) is widely used to identify and quantify peptides in complex biological samples. In particular, label-free shotgun proteomics is highly effective for the identification of peptides and subsequently obtaining a global protein profile of a sample. As a result, this approach is widely used for discovery studies. Typically, the objective of these discovery studies is to identify proteins that are affected by some condition of interest (e.g. disease, exposure). However, for complex biological samples, label-free LC-MS proteomics experiments measure peptides and do not directly yield protein quantities. Thus, protein quantification must be inferred frommore » one or more measured peptides. In recent years, many computational approaches to relative protein quantification of label-free LC-MS data have been published. In this review, we examine the most commonly employed quantification approaches to relative protein abundance from peak intensity values, evaluate their individual merits, and discuss challenges in the use of the various computational approaches.« less
ERIC Educational Resources Information Center
Everhart, Julie M.; Alber-Morgan, Sheila R.; Park, Ju Hee
2011-01-01
This study investigated the effects of computer-based practice on the acquisition and maintenance of basic academic skills for two children with moderate to intensive disabilities. The special education teacher created individualized computer games that enabled the participants to independently practice academic skills that corresponded with their…
GeoBrain Computational Cyber-laboratory for Earth Science Studies
NASA Astrophysics Data System (ADS)
Deng, M.; di, L.
2009-12-01
Computational approaches (e.g., computer-based data visualization, analysis and modeling) are critical for conducting increasingly data-intensive Earth science (ES) studies to understand functions and changes of the Earth system. However, currently Earth scientists, educators, and students have met two major barriers that prevent them from being effectively using computational approaches in their learning, research and application activities. The two barriers are: 1) difficulties in finding, obtaining, and using multi-source ES data; and 2) lack of analytic functions and computing resources (e.g., analysis software, computing models, and high performance computing systems) to analyze the data. Taking advantages of recent advances in cyberinfrastructure, Web service, and geospatial interoperability technologies, GeoBrain, a project funded by NASA, has developed a prototype computational cyber-laboratory to effectively remove the two barriers. The cyber-laboratory makes ES data and computational resources at large organizations in distributed locations available to and easily usable by the Earth science community through 1) enabling seamless discovery, access and retrieval of distributed data, 2) federating and enhancing data discovery with a catalogue federation service and a semantically-augmented catalogue service, 3) customizing data access and retrieval at user request with interoperable, personalized, and on-demand data access and services, 4) automating or semi-automating multi-source geospatial data integration, 5) developing a large number of analytic functions as value-added, interoperable, and dynamically chainable geospatial Web services and deploying them in high-performance computing facilities, 6) enabling the online geospatial process modeling and execution, and 7) building a user-friendly extensible web portal for users to access the cyber-laboratory resources. Users can interactively discover the needed data and perform on-demand data analysis and modeling through the web portal. The GeoBrain cyber-laboratory provides solutions to meet common needs of ES research and education, such as, distributed data access and analysis services, easy access to and use of ES data, and enhanced geoprocessing and geospatial modeling capability. It greatly facilitates ES research, education, and applications. The development of the cyber-laboratory provides insights, lessons-learned, and technology readiness to build more capable computing infrastructure for ES studies, which can meet wide-range needs of current and future generations of scientists, researchers, educators, and students for their formal or informal educational training, research projects, career development, and lifelong learning.
NASA Technical Reports Server (NTRS)
Meisel, D. D.
1976-01-01
Preliminary data required to extrapolate available meteor physics information (obtained in the photographic, visual and near ultraviolet spectral regions) into the middle and far ultraviolet are presented. Wavelength tables, telluric attenuation factors, meteor rates, and telluric airglow data are summarized in the context of near-earth observation vehicle parameters using moderate to low spectral resolution instrumentation. Considerable attenuation is given to the problem of meteor excitation temperatures since these are required to predict the strength of UV features. Relative line intensities are computed for an assumed chondritic composition. Features of greatest predicted intensities, the major problems in meteor physics, detectability of UV meteor events, complications of spacecraft motion, and UV instrumentation options are summarized.
Identification of features in indexed data and equipment therefore
Jarman, Kristin H [Richland, WA; Daly, Don Simone [Richland, WA; Anderson, Kevin K [Richland, WA; Wahl, Karen L [Richland, WA
2002-04-02
Embodiments of the present invention provide methods of identifying a feature in an indexed dataset. Such embodiments encompass selecting an initial subset of indices, the initial subset of indices being encompassed by an initial window-of-interest and comprising at least one beginning index and at least one ending index; computing an intensity weighted measure of dispersion for the subset of indices using a subset of responses corresponding to the subset of indices; and comparing the intensity weighted measure of dispersion to a dispersion critical value determined from an expected value of the intensity weighted measure of dispersion under a null hypothesis of no transient feature present. Embodiments of the present invention also encompass equipment configured to perform the methods of the present invention.
NASA Astrophysics Data System (ADS)
Wan, Junwei; Chen, Hongyan; Zhao, Jing
2017-08-01
According to the requirements of real-time, reliability and safety for aerospace experiment, the single center cloud computing technology application verification platform is constructed. At the IAAS level, the feasibility of the cloud computing technology be applied to the field of aerospace experiment is tested and verified. Based on the analysis of the test results, a preliminary conclusion is obtained: Cloud computing platform can be applied to the aerospace experiment computing intensive business. For I/O intensive business, it is recommended to use the traditional physical machine.
Analysis of computer images in the presence of metals
NASA Astrophysics Data System (ADS)
Buzmakov, Alexey; Ingacheva, Anastasia; Prun, Victor; Nikolaev, Dmitry; Chukalina, Marina; Ferrero, Claudio; Asadchikov, Victor
2018-04-01
Artifacts caused by intensely absorbing inclusions are encountered in computed tomography via polychromatic scanning and may obscure or simulate pathologies in medical applications. To improve the quality of reconstruction if high-Z inclusions in presence, previously we proposed and tested with synthetic data an iterative technique with soft penalty mimicking linear inequalities on the photon-starved rays. This note reports a test at the tomographic laboratory set-up at the Institute of Crystallography FSRC "Crystallography and Photonics" RAS in which tomographic scans were successfully made of temporary tooth without inclusion and with Pb inclusion.
Exploring the Earth Using Deep Learning Techniques
NASA Astrophysics Data System (ADS)
Larraondo, P. R.; Evans, B. J. K.; Antony, J.
2016-12-01
Research using deep neural networks have significantly matured in recent times, and there is now a surge in interest to apply such methods to Earth systems science and the geosciences. When combined with Big Data, we believe there are opportunities for significantly transforming a number of areas relevant to researchers and policy makers. In particular, by using a combination of data from a range of satellite Earth observations as well as computer simulations from climate models and reanalysis, we can gain new insights into the information that is locked within the data. Global geospatial datasets describe a wide range of physical and chemical parameters, which are mostly available using regular grids covering large spatial and temporal extents. This makes them perfect candidates to apply deep learning methods. So far, these techniques have been successfully applied to image analysis through the use of convolutional neural networks. However, this is only one field of interest, and there is potential for many more use cases to be explored. The deep learning algorithms require fast access to large amounts of data in the form of tensors and make intensive use of CPU in order to train its models. The Australian National Computational Infrastructure (NCI) has recently augmented its Raijin 1.2 PFlop supercomputer with hardware accelerators. Together with NCI's 3000 core high performance OpenStack cloud, these computational systems have direct access to NCI's 10+ PBytes of datasets and associated Big Data software technologies (see http://geonetwork.nci.org.au/ and http://nci.org.au/systems-services/national-facility/nerdip/). To effectively use these computing infrastructures requires that both the data and software are organised in a way that readily supports the deep learning software ecosystem. Deep learning software, such as the open source TensorFlow library, has allowed us to demonstrate the possibility of generating geospatial models by combining information from our different data sources. This opens the door to an exciting new way of generating products and extracting features that have previously been labour intensive. In this paper, we will explore some of these geospatial use cases and share some of the lessons learned from this experience.
A Big Data Approach to Analyzing Market Volatility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Bethel, E. Wes; Gu, Ming
2013-06-05
Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. Analyzing such a large volume of data requires tremendous computing power that is not easily available to financial academics and regulators. Fortunately, public funded High Performance Computing (HPC) power is widely available at the National Laboratories in the US. In this paper we demonstrate that the HPC resource and the techniques for data-intensive sciences can be used to greatly accelerate the computation of an early warning indicator called Volume-synchronized Probability of Informed tradingmore » (VPIN). The test data used in this study contains five and a half year's worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different ways of computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. Our test demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time – an ability that could be valuable to regulators. Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.« less
Retkute, Renata; Townsend, Alexandra J; Murchie, Erik H; Jensen, Oliver E; Preston, Simon P
2018-05-25
Diurnal changes in solar position and intensity combined with the structural complexity of plant architecture result in highly variable and dynamic light patterns within the plant canopy. This affects productivity through the complex ways that photosynthesis responds to changes in light intensity. Current methods to characterize light dynamics, such as ray-tracing, are able to produce data with excellent spatio-temporal resolution but are computationally intensive and the resulting data are complex and high-dimensional. This necessitates development of more economical models for summarizing the data and for simulating realistic light patterns over the course of a day. High-resolution reconstructions of field-grown plants are assembled in various configurations to form canopies, and a forward ray-tracing algorithm is applied to the canopies to compute light dynamics at high (1 min) temporal resolution. From the ray-tracer output, the sunlit or shaded state for each patch on the plants is determined, and these data are used to develop a novel stochastic model for the sunlit-shaded patterns. The model is designed to be straightforward to fit to data using maximum likelihood estimation, and fast to simulate from. For a wide range of contrasting 3-D canopies, the stochastic model is able to summarize, and replicate in simulations, key features of the light dynamics. When light patterns simulated from the stochastic model are used as input to a model of photoinhibition, the predicted reduction in carbon gain is similar to that from calculations based on the (extremely costly) ray-tracer data. The model provides a way to summarize highly complex data in a small number of parameters, and a cost-effective way to simulate realistic light patterns. Simulations from the model will be particularly useful for feeding into larger-scale photosynthesis models for calculating how light dynamics affects the photosynthetic productivity of canopies.
Analysis of intensity variability in multislice and cone beam computed tomography.
Nackaerts, Olivia; Maes, Frederik; Yan, Hua; Couto Souza, Paulo; Pauwels, Ruben; Jacobs, Reinhilde
2011-08-01
The aim of this study was to evaluate the variability of intensity values in cone beam computed tomography (CBCT) imaging compared with multislice computed tomography Hounsfield units (MSCT HU) in order to assess the reliability of density assessments using CBCT images. A quality control phantom was scanned with an MSCT scanner and five CBCT scanners. In one CBCT scanner, the phantom was scanned repeatedly in the same and in different positions. Images were analyzed using registration to a mathematical model. MSCT images were used as a reference. Density profiles of MSCT showed stable HU values, whereas in CBCT imaging the intensity values were variable over the profile. Repositioning of the phantom resulted in large fluctuations in intensity values. The use of intensity values in CBCT images is not reliable, because the values are influenced by device, imaging parameters and positioning. © 2011 John Wiley & Sons A/S.
Numerical Analysis of Crack Tip Plasticity and History Effects under Mixed Mode Conditions
NASA Astrophysics Data System (ADS)
Lopez-Crespo, Pablo; Pommier, Sylvie
The plastic behaviour in the crack tip region has a strong influence on the fatigue life of engineering components. In general, residual stresses developed as a consequence of the plasticity being constrained around the crack tip have a significant role on both the direction of crack propagation and the propagation rate. Finite element methods (FEM) are commonly employed in order to model plasticity. However, if millions of cycles need to be modelled to predict the fatigue behaviour of a component, the method becomes computationally too expensive. By employing a multiscale approach, very precise analyses computed by FEM can be brought to a global scale. The data generated using the FEM enables us to identify a global cyclic elastic-plastic model for the crack tip region. Once this model is identified, it can be employed directly, with no need of additional FEM computations, resulting in fast computations. This is done by partitioning local displacement fields computed by FEM into intensity factors (global data) and spatial fields. A Karhunen-Loeve algorithm developed for image processing was employed for this purpose. In addition, the partitioning is done such as to distinguish into elastic and plastic components. Each of them is further divided into opening mode and shear mode parts. The plastic flow direction was determined with the above approach on a centre cracked panel subjected to a wide range of mixed-mode loading conditions. It was found to agree well with the maximum tangential stress criterion developed by Erdogan and Sih, provided that the loading direction is corrected for residual stresses. In this approach, residual stresses are measured at the global scale through internal intensity factors.
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Sandstrom, Timothy A.; Henze, Chris; Levit, Creon
2003-01-01
This paper presents the hyperwall, a visualization cluster that uses coordinated visualizations for interactive exploration of multidimensional data and simulations. The system strongly leverages the human eye-brain system with a generous 7x7 array offlat panel LCD screens powered by a beowulf clustel: With each screen backed by a workstation class PC, graphic and compute intensive applications can be applied to a broad range of data. Navigational tools are presented that allow for investigation of high dimensional spaces.
Terascale direct numerical simulations of turbulent combustion using S3D
NASA Astrophysics Data System (ADS)
Chen, J. H.; Choudhary, A.; de Supinski, B.; DeVries, M.; Hawkes, E. R.; Klasky, S.; Liao, W. K.; Ma, K. L.; Mellor-Crummey, J.; Podhorszki, N.; Sankaran, R.; Shende, S.; Yoo, C. S.
2009-01-01
Computational science is paramount to the understanding of underlying processes in internal combustion engines of the future that will utilize non-petroleum-based alternative fuels, including carbon-neutral biofuels, and burn in new combustion regimes that will attain high efficiency while minimizing emissions of particulates and nitrogen oxides. Next-generation engines will likely operate at higher pressures, with greater amounts of dilution and utilize alternative fuels that exhibit a wide range of chemical and physical properties. Therefore, there is a significant role for high-fidelity simulations, direct numerical simulations (DNS), specifically designed to capture key turbulence-chemistry interactions in these relatively uncharted combustion regimes, and in particular, that can discriminate the effects of differences in fuel properties. In DNS, all of the relevant turbulence and flame scales are resolved numerically using high-order accurate numerical algorithms. As a consequence terascale DNS are computationally intensive, require massive amounts of computing power and generate tens of terabytes of data. Recent results from terascale DNS of turbulent flames are presented here, illustrating its role in elucidating flame stabilization mechanisms in a lifted turbulent hydrogen/air jet flame in a hot air coflow, and the flame structure of a fuel-lean turbulent premixed jet flame. Computing at this scale requires close collaborations between computer and combustion scientists to provide optimized scaleable algorithms and software for terascale simulations, efficient collective parallel I/O, tools for volume visualization of multiscale, multivariate data and automating the combustion workflow. The enabling computer science, applied to combustion science, is also required in many other terascale physics and engineering simulations. In particular, performance monitoring is used to identify the performance of key kernels in the DNS code, S3D and especially memory intensive loops in the code. Through the careful application of loop transformations, data reuse in cache is exploited thereby reducing memory bandwidth needs, and hence, improving S3D's nodal performance. To enhance collective parallel I/O in S3D, an MPI-I/O caching design is used to construct a two-stage write-behind method for improving the performance of write-only operations. The simulations generate tens of terabytes of data requiring analysis. Interactive exploration of the simulation data is enabled by multivariate time-varying volume visualization. The visualization highlights spatial and temporal correlations between multiple reactive scalar fields using an intuitive user interface based on parallel coordinates and time histogram. Finally, an automated combustion workflow is designed using Kepler to manage large-scale data movement, data morphing, and archival and to provide a graphical display of run-time diagnostics.
Fragmentation of care and the use of head computed tomography in patients with ischemic stroke.
Bekelis, Kimon; Roberts, David W; Zhou, Weiping; Skinner, Jonathan S
2014-05-01
Computed tomographic (CT) scans are central diagnostic tests for ischemic stroke. Their inefficient use is a negative quality measure tracked by the Centers for Medicare and Medicaid Services. We performed a retrospective analysis of Medicare fee-for-service claims data for adults admitted for ischemic stroke from 2008 to 2009, with 1-year follow-up. The outcome measures were risk-adjusted rates of high-intensity CT use (≥4 head CT scans) and risk- and price-adjusted Medicare expenditures in the year after admission. The average number of head CT scans in the year after admission, for the 327 521 study patients, was 1.94, whereas 11.9% had ≥4. Risk-adjusted rates of high-intensity CT use ranged from 4.6% (Napa, CA) to 20.0% (East Long Island, NY). These rates were 2.6% higher for blacks than for whites (95% confidence interval, 2.1%-3.1%), with considerable regional variation. Higher fragmentation of care (number of different doctors seen) was associated with high-intensity CT use. Patients living in the top quintile regions of fragmentation experienced a 5.9% higher rate of high-intensity CT use, with the lowest quintile as reference; the corresponding odds ratio was 1.77 (95% confidence interval, 1.71-1.83). Similarly, 1-year risk- and price-adjusted expenditures exhibited considerable regional variation, ranging from $31 175 (Salem, MA) to $61 895 (McAllen, TX). Regional rates of high-intensity CT scans were positively associated with 1-year expenditures (r=0.56; P<0.01). Rates of high-intensity CT use for patients with ischemic stroke reflect wide practice patterns across regions and races. Medicare expenditures parallel these disparities. Fragmentation of care is associated with high-intensity CT use. © 2014 American Heart Association, Inc.
The Generation Challenge Programme Platform: Semantic Standards and Workbench for Crop Science
Bruskiewich, Richard; Senger, Martin; Davenport, Guy; Ruiz, Manuel; Rouard, Mathieu; Hazekamp, Tom; Takeya, Masaru; Doi, Koji; Satoh, Kouji; Costa, Marcos; Simon, Reinhard; Balaji, Jayashree; Akintunde, Akinnola; Mauleon, Ramil; Wanchana, Samart; Shah, Trushar; Anacleto, Mylah; Portugal, Arllet; Ulat, Victor Jun; Thongjuea, Supat; Braak, Kyle; Ritter, Sebastian; Dereeper, Alexis; Skofic, Milko; Rojas, Edwin; Martins, Natalia; Pappas, Georgios; Alamban, Ryan; Almodiel, Roque; Barboza, Lord Hendrix; Detras, Jeffrey; Manansala, Kevin; Mendoza, Michael Jonathan; Morales, Jeffrey; Peralta, Barry; Valerio, Rowena; Zhang, Yi; Gregorio, Sergio; Hermocilla, Joseph; Echavez, Michael; Yap, Jan Michael; Farmer, Andrew; Schiltz, Gary; Lee, Jennifer; Casstevens, Terry; Jaiswal, Pankaj; Meintjes, Ayton; Wilkinson, Mark; Good, Benjamin; Wagner, James; Morris, Jane; Marshall, David; Collins, Anthony; Kikuchi, Shoshi; Metz, Thomas; McLaren, Graham; van Hintum, Theo
2008-01-01
The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding. A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive, high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making. PMID:18483570
Openwebglobe 2: Visualization of Complex 3D-GEODATA in the (mobile) Webbrowser
NASA Astrophysics Data System (ADS)
Christen, M.
2016-06-01
Providing worldwide high resolution data for virtual globes consists of compute and storage intense tasks for processing data. Furthermore, rendering complex 3D-Geodata, such as 3D-City models with an extremely high polygon count and a vast amount of textures at interactive framerates is still a very challenging task, especially on mobile devices. This paper presents an approach for processing, caching and serving massive geospatial data in a cloud-based environment for large scale, out-of-core, highly scalable 3D scene rendering on a web based virtual globe. Cloud computing is used for processing large amounts of geospatial data and also for providing 2D and 3D map data to a large amount of (mobile) web clients. In this paper the approach for processing, rendering and caching very large datasets in the currently developed virtual globe "OpenWebGlobe 2" is shown, which displays 3D-Geodata on nearly every device.
Integration of scheduling and discrete event simulation systems to improve production flow planning
NASA Astrophysics Data System (ADS)
Krenczyk, D.; Paprocka, I.; Kempa, W. M.; Grabowik, C.; Kalinowski, K.
2016-08-01
The increased availability of data and computer-aided technologies such as MRPI/II, ERP and MES system, allowing producers to be more adaptive to market dynamics and to improve production scheduling. Integration of production scheduling and computer modelling, simulation and visualization systems can be useful in the analysis of production system constraints related to the efficiency of manufacturing systems. A integration methodology based on semi-automatic model generation method for eliminating problems associated with complexity of the model and labour-intensive and time-consuming process of simulation model creation is proposed. Data mapping and data transformation techniques for the proposed method have been applied. This approach has been illustrated through examples of practical implementation of the proposed method using KbRS scheduling system and Enterprise Dynamics simulation system.
NASA Astrophysics Data System (ADS)
Zhou, Jianfeng; Xu, Benda; Peng, Chuan; Yang, Yang; Huo, Zhuoxi
2015-08-01
AIRE-Linux is a dedicated Linux system for astronomers. Modern astronomy faces two big challenges: massive observed raw data which covers the whole electromagnetic spectrum, and overmuch professional data processing skill which exceeds personal or even a small team's abilities. AIRE-Linux, which is a specially designed Linux and will be distributed to users by Virtual Machine (VM) images in Open Virtualization Format (OVF), is to help astronomers confront the challenges. Most astronomical software packages, such as IRAF, MIDAS, CASA, Heasoft etc., will be integrated into AIRE-Linux. It is easy for astronomers to configure and customize the system and use what they just need. When incorporated into cloud computing platforms, AIRE-Linux will be able to handle data intensive and computing consuming tasks for astronomers. Currently, a Beta version of AIRE-Linux is ready for download and testing.
CBESW: sequence alignment on the Playstation 3.
Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil
2008-09-17
The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.
CBESW: Sequence Alignment on the Playstation 3
Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil
2008-01-01
Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications. PMID:18798993
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Dynamic Collaboration Infrastructure for Hydrologic Science
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.
2016-12-01
Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.
NASA Astrophysics Data System (ADS)
Wyborn, L. A.
2013-12-01
The emergence of the fourth paradigm of data intensive science in 2007 showed great promise: it offered a new fundamental methodology in scientific exploration in which researchers would be able to harness the huge increase in data volumes coming from new and more powerful instruments that were collecting data at unprecedented rates and at ever increasing resolutions. Given the potential this new methodology offered, decadal challenges were issued to the Earth and Space Science community to come together and work on problems such as impacts of climate change; sustainably exploiting scarce water, mineral and petroleum resources; and protecting our communities through better prediction of the behaviour of natural hazards. Such challenges require the capability to integrate heterogeneous data sets, from multiple sources, across multiple domains and at low transactional cost. To help realise these visions significant investments were made globally in cyberinfrastructures (computer centres, research clouds, data stores, high speed networks, etc.). Combined, these infrastructures are now capable of analysing petabyte size chunks of data, and the climate community is close to operating at exascale. But have we actually realised the vision of data intensive science? The simple reality is that data intensive science requires the capability to find and analyse large volumes of data in real time via machine to machine interactions. It is not necessarily just about ';Big Data' sets collected from remote instruments such as satellites or sensor networks. ';Long Tail' data sets, traditionally the output of small science campaigns, are vital to calibrating large data sets and need to be stored so that they can be reused and repurposed in ways beyond what the original collector of the data intended they be used for. Particularly for meaningful time series analysis in environmental sciences, there is the additional challenge to store and manage data through decades of multiple evolutions of both hardware and software. The move to data intensive science has driven the realisation that we need to put more effort and resources into rescuing, curating and preserving data and properly preserved data sets are now being use to resolve the real world issues of today. However, as the capacity of computational systems increases relentlessly we need to question if our current efforts in data curation and preservation will scale to these ever growing systems. For Earth and Space Sciences to come out of the digital dark ages and into the renaissance of multi-source science, it is time to take stock and question our current data rescue, curation and preservation initiatives. Will the data store I am using be around in 50 years' time? What measures is this data store taking to avoid bit-rot and/or deal with software and hardware obsolescence. Is my data self-describing? Have I paid enough attention to cross domain data standards so my data can be reused and repurposed for the current decadal challenges? More importantly, as the capacity of computational systems scale beyond exascale to zettascale and yottascale, will my data sets that I have rescued, curated and preserved in my lifetime, no matter whether they are small or large, be able to contribute to addressing the decadal challenges that are as yet undefined.
NASA Astrophysics Data System (ADS)
Sellars, S. L.; Nguyen, P.; Tatar, J.; Graham, J.; Kawsenuk, B.; DeFanti, T.; Smarr, L.; Sorooshian, S.; Ralph, M.
2017-12-01
A new era in computational earth sciences is within our grasps with the availability of ever-increasing earth observational data, enhanced computational capabilities, and innovative computation approaches that allow for the assimilation, analysis and ability to model the complex earth science phenomena. The Pacific Research Platform (PRP), CENIC and associated technologies such as the Flash I/O Network Appliance (FIONA) provide scientists a unique capability for advancing towards this new era. This presentation reports on the development of multi-institutional rapid data access capabilities and data pipeline for applying a novel image characterization and segmentation approach, CONNected objECT (CONNECT) algorithm to study Atmospheric River (AR) events impacting the Western United States. ARs are often associated with torrential rains, swollen rivers, flash flooding, and mudslides. CONNECT is computationally intensive, reliant on very large data transfers, storage and data mining techniques. The ability to apply the method to multiple variables and datasets located at different University of California campuses has previously been challenged by inadequate network bandwidth and computational constraints. The presentation will highlight how the inter-campus CONNECT data mining framework improved from our prior download speeds of 10MB/s to 500MB/s using the PRP and the FIONAs. We present a worked example using the NASA MERRA data to describe how the PRP and FIONA have provided researchers with the capability for advancing knowledge about ARs. Finally, we will discuss future efforts to expand the scope to additional variables in earth sciences.
NASA Astrophysics Data System (ADS)
Vilotte, J. P.; Atkinson, M.; Spinuso, A.; Rietbrock, A.; Michelini, A.; Igel, H.; Frank, A.; Carpené, M.; Schwichtenberg, H.; Casarotti, E.; Filgueira, R.; Garth, T.; Germünd, A.; Klampanos, I.; Krause, A.; Krischer, L.; Leong, S. H.; Magnoni, F.; Matser, J.; Moguilny, G.
2015-12-01
Seismology addresses both fundamental problems in understanding the Earth's internal wave sources and structures and augmented societal applications, like earthquake and tsunami hazard assessment and risk mitigation; and puts a premium on open-data accessible by the Federated Digital Seismological Networks. The VERCE project, "Virtual Earthquake and seismology Research Community e-science environment in Europe", has initiated a virtual research environment to support complex orchestrated workflows combining state-of-art wave simulation codes and data analysis tools on distributed computing and data infrastructures (DCIs) along with multiple sources of observational data and new capabilities to combine simulation results with observational data. The VERCE Science Gateway provides a view of all the available resources, supporting collaboration with shared data and methods, with data access controls. The mapping to DCIs handles identity management, authority controls, transformations between representations and controls, and access to resources. The framework for computational science that provides simulation codes, like SPECFEM3D, democratizes their use by getting data from multiple sources, managing Earth models and meshes, distilling them as input data, and capturing results with meta-data. The dispel4py data-intensive framework allows for developing data-analysis applications using Python and the ObsPy library, which can be executed on different DCIs. A set of tools allows coupling with seismology and external data services. Provenance driven tools validate results and show relationships between data to facilitate method improvement. Lessons learned from VERCE training lead us to conclude that solid-Earth scientists could make significant progress by using VERCE e-science environment. VERCE has already contributed to the European Plate Observation System (EPOS), and is part of the EPOS implementation phase. Its cross-disciplinary capabilities are being extended for the EPOS implantation phase.
Ray, Nilanjan
2011-10-01
Fluid motion estimation from time-sequenced images is a significant image analysis task. Its application is widespread in experimental fluidics research and many related areas like biomedical engineering and atmospheric sciences. In this paper, we present a novel flow computation framework to estimate the flow velocity vectors from two consecutive image frames. In an energy minimization-based flow computation, we propose a novel data fidelity term, which: 1) can accommodate various measures, such as cross-correlation or sum of absolute or squared differences of pixel intensities between image patches; 2) has a global mechanism to control the adverse effect of outliers arising out of motion discontinuities, proximity of image borders; and 3) can go hand-in-hand with various spatial smoothness terms. Further, the proposed data term and related regularization schemes are both applicable to dense and sparse flow vector estimations. We validate these claims by numerical experiments on benchmark flow data sets. © 2011 IEEE
spMC: an R-package for 3D lithological reconstructions based on spatial Markov chains
NASA Astrophysics Data System (ADS)
Sartore, Luca; Fabbri, Paolo; Gaetan, Carlo
2016-09-01
The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil simulation/prediction located in a plain site of Northeastern Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, simulation methods based on most known prediction methods (as indicator Kriging and CoKriging) were implemented in spMC package. Moreover, other more advanced methods are available for simulations, e.g. path methods and Bayesian procedures, that exploit the maximum entropy. Since the spMC package was developed for intensive geostatistical computations, part of the code is implemented for parallel computations via the OpenMP constructs. A final analysis of this computational efficiency compares the simulation/prediction algorithms by using different numbers of CPU cores, and considering the example data set of the case study included in the package.
Registration of 2D to 3D joint images using phase-based mutual information
NASA Astrophysics Data System (ADS)
Dalvi, Rupin; Abugharbieh, Rafeef; Pickering, Mark; Scarvell, Jennie; Smith, Paul
2007-03-01
Registration of two dimensional to three dimensional orthopaedic medical image data has important applications particularly in the area of image guided surgery and sports medicine. Fluoroscopy to computer tomography (CT) registration is an important case, wherein digitally reconstructed radiographs derived from the CT data are registered to the fluoroscopy data. Traditional registration metrics such as intensity-based mutual information (MI) typically work well but often suffer from gross misregistration errors when the image to be registered contains a partial view of the anatomy visible in the target image. Phase-based MI provides a robust alternative similarity measure which, in addition to possessing the general robustness and noise immunity that MI provides, also employs local phase information in the registration process which makes it less susceptible to the aforementioned errors. In this paper, we propose using the complex wavelet transform for computing image phase information and incorporating that into a phase-based MI measure for image registration. Tests on a CT volume and 6 fluoroscopy images of the knee are presented. The femur and the tibia in the CT volume were individually registered to the fluoroscopy images using intensity-based MI, gradient-based MI and phase-based MI. Errors in the coordinates of fiducials present in the bone structures were used to assess the accuracy of the different registration schemes. Quantitative results demonstrate that the performance of intensity-based MI was the worst. Gradient-based MI performed slightly better, while phase-based MI results were the best consistently producing the lowest errors.
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-01-01
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data. PMID:25225874
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-09-15
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data.
ERIC Educational Resources Information Center
Larraz, Beatriz
2015-01-01
The aim of this article is to propose a new breakdown of the Gini inequality ratio into three components ("within-group" inequality, "between-group" inequality, and intensity of "transvariation" between groups to the total inequality index). The between-group inequality concept computes all the differences in salaries…
Moored rainfall measurements during COARE
NASA Technical Reports Server (NTRS)
Mcphaden, Michael J.
1994-01-01
This presentation discusses mini-ORG rainfall estimates collected from an array of six moornings in the western equatorial Pacific during the TOGA-COARE experiment. The moorings were clustered in the vicinity of the COARE intensive flux array (IFA) centered near 2 deg S, 156 deg E. The basic data set consisted of hourly means computed from 5-second samples.
A maximum entropy reconstruction technique for tomographic particle image velocimetry
NASA Astrophysics Data System (ADS)
Bilsky, A. V.; Lozhkin, V. A.; Markovich, D. M.; Tokarev, M. P.
2013-04-01
This paper studies a novel approach for reducing tomographic PIV computational complexity. The proposed approach is an algebraic reconstruction technique, termed MENT (maximum entropy). This technique computes the three-dimensional light intensity distribution several times faster than SMART, using at least ten times less memory. Additionally, the reconstruction quality remains nearly the same as with SMART. This paper presents the theoretical computation performance comparison for MENT, SMART and MART, followed by validation using synthetic particle images. Both the theoretical assessment and validation of synthetic images demonstrate significant computational time reduction. The data processing accuracy of MENT was compared to that of SMART in a slot jet experiment. A comparison of the average velocity profiles shows a high level of agreement between the results obtained with MENT and those obtained with SMART.
Mobile computing device configured to compute irradiance, glint, and glare of the sun
Gupta, Vipin P; Ho, Clifford K; Khalsa, Siri Sahib
2014-03-11
Described herein are technologies pertaining to computing the solar irradiance distribution on a surface of a receiver in a concentrating solar power system or glint/glare emitted from a reflective entity. A mobile computing device includes at least one camera that captures images of the Sun and the entity of interest, wherein the images have pluralities of pixels having respective pluralities of intensity values. Based upon the intensity values of the pixels in the respective images, the solar irradiance distribution on the surface of the entity or glint/glare corresponding to the entity is computed by the mobile computing device.
Parallel computer processing and modeling: applications for the ICU
NASA Astrophysics Data System (ADS)
Baxter, Grant; Pranger, L. Alex; Draghic, Nicole; Sims, Nathaniel M.; Wiesmann, William P.
2003-07-01
Current patient monitoring procedures in hospital intensive care units (ICUs) generate vast quantities of medical data, much of which is considered extemporaneous and not evaluated. Although sophisticated monitors to analyze individual types of patient data are routinely used in the hospital setting, this equipment lacks high order signal analysis tools for detecting long-term trends and correlations between different signals within a patient data set. Without the ability to continuously analyze disjoint sets of patient data, it is difficult to detect slow-forming complications. As a result, the early onset of conditions such as pneumonia or sepsis may not be apparent until the advanced stages. We report here on the development of a distributed software architecture test bed and software medical models to analyze both asynchronous and continuous patient data in real time. Hardware and software has been developed to support a multi-node distributed computer cluster capable of amassing data from multiple patient monitors and projecting near and long-term outcomes based upon the application of physiologic models to the incoming patient data stream. One computer acts as a central coordinating node; additional computers accommodate processing needs. A simple, non-clinical model for sepsis detection was implemented on the system for demonstration purposes. This work shows exceptional promise as a highly effective means to rapidly predict and thereby mitigate the effect of nosocomial infections.
NASA Technical Reports Server (NTRS)
Belcastro, C. M.
1984-01-01
Advanced composite aircraft designs include fault-tolerant computer-based digital control systems with thigh reliability requirements for adverse as well as optimum operating environments. Since aircraft penetrate intense electromagnetic fields during thunderstorms, onboard computer systems maya be subjected to field-induced transient voltages and currents resulting in functional error modes which are collectively referred to as digital system upset. A methodology was developed for assessing the upset susceptibility of a computer system onboard an aircraft flying through a lightning environment. Upset error modes in a general-purpose microprocessor were studied via tests which involved the random input of analog transients which model lightning-induced signals onto interface lines of an 8080-based microcomputer from which upset error data were recorded. The application of Markov modeling to upset susceptibility estimation is discussed and a stochastic model development.
Williams, Eric
2004-11-15
The total energy and fossil fuels used in producing a desktop computer with 17-in. CRT monitor are estimated at 6400 megajoules (MJ) and 260 kg, respectively. This indicates that computer manufacturing is energy intensive: the ratio of fossil fuel use to product weight is 11, an order of magnitude larger than the factor of 1-2 for many other manufactured goods. This high energy intensity of manufacturing, combined with rapid turnover in computers, results in an annual life cycle energy burden that is surprisingly high: about 2600 MJ per year, 1.3 times that of a refrigerator. In contrast with many home appliances, life cycle energy use of a computer is dominated by production (81%) as opposed to operation (19%). Extension of usable lifespan (e.g. by reselling or upgrading) is thus a promising approach to mitigating energy impacts as well as other environmental burdens associated with manufacturing and disposal.
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C.
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction. PMID:24904400
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research.
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction.
NASA Astrophysics Data System (ADS)
Cilia, M. G.; Baker, L. M.
2015-12-01
We determine empirical relationships between instrumental peak ground motions and observed intensities for two great Chilean subduction earthquakes: the 2010 Mw8.8 Maule earthquake and the 2014 Mw8.2 Iquique earthquake. Both occurred immediately offshore on the primary plate boundary interface between the Nazca and South America plates. They are among the largest earthquakes to be instrumentally recorded; the 2010 Maule event is the second largest earthquake to produce strong motion recordings. Ground motion to intensity conversion equations (GMICEs) are used to reconstruct the distribution of shaking for historical earthquakes by using intensities estimated from contemporary accounts. Most great (M>8) earthquakes, like these, occur within subduction zones, yet few GMICEs exist for subduction earthquakes. It is unclear whether GMICEs developed for active crustal regions, such as California, can be scaled up to the large M of subduction zone events, or if new data sets must be analyzed to develop separate subduction GMICEs. To address this question, we pair instrumental peak ground motions, both acceleration (PGA) and velocity (PGV), with intensities derived from onsite surveys of earthquake damage made in the weeks after the events and internet-derived felt reports. We fit a linear predictive equation between the geometric mean of the maximum PGA or PGV of the two horizontal components and intensity, using linear least squares. We use a weighting scheme to express the uncertainty of the pairings based on a station's proximity to the nearest intensity observation. The intensity data derived from the onsite surveys is a complete, high-quality investigation of the earthquake damage. We perform the computations using both the survey data and community decimal intensities (CDI) calculated from felt reports volunteered by citizens (USGS "Did You Feel It", DYFI) and compare the results. We compare the GMICEs we developed to the most widely used GMICEs from California and central US earthquakes, and global earthquakes. Existing GMICEs consistently over-predict intensity for these two subduction events. This may be a regional difference, or a magnitude-dependent effect. Currently, however, there is not enough data from these great subduction earthquakes to prefer one interpretation over the other.
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform
Yang, Lin; Gong, Leiguang; Zhang, Hong; Nosher, John L.; Foran, David J.
2013-01-01
Point matching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point matching related applications, such as medical image registration, require real time or near real time performance if applied to critical clinical applications like image assisted surgery. In this paper, we report a new multicore platform based parallel algorithm for fast point matching in the context of landmark based medical image registration. We introduced a non-regular data partition algorithm which utilizes the K-means clustering algorithm to group the landmarks based on the number of available processing cores, which optimize the memory usage and data transfer. We have tested our method using the IBM Cell Broadband Engine (Cell/B.E.) platform. The results demonstrated a significant speed up over its sequential implementation. The proposed data partition and parallelization algorithm, though tested only on one multicore platform, is generic by its design. Therefore the parallel algorithm can be extended to other computing platforms, as well as other point matching related applications. PMID:24308014
Butail, Sachit; Salerno, Philip; Bollt, Erik M; Porfiri, Maurizio
2015-12-01
Traditional approaches for the analysis of collective behavior entail digitizing the position of each individual, followed by evaluation of pertinent group observables, such as cohesion and polarization. Machine learning may enable considerable advancements in this area by affording the classification of these observables directly from images. While such methods have been successfully implemented in the classification of individual behavior, their potential in the study collective behavior is largely untested. In this paper, we compare three methods for the analysis of collective behavior: simple tracking (ST) without resolving occlusions, machine learning with real data (MLR), and machine learning with synthetic data (MLS). These methods are evaluated on videos recorded from an experiment studying the effect of ambient light on the shoaling tendency of Giant danios. In particular, we compute average nearest-neighbor distance (ANND) and polarization using the three methods and compare the values with manually-verified ground-truth data. To further assess possible dependence on sampling rate for computing ANND, the comparison is also performed at a low frame rate. Results show that while ST is the most accurate at higher frame rate for both ANND and polarization, at low frame rate for ANND there is no significant difference in accuracy between the three methods. In terms of computational speed, MLR and MLS take significantly less time to process an image, with MLS better addressing constraints related to generation of training data. Finally, all methods are able to successfully detect a significant difference in ANND as the ambient light intensity is varied irrespective of the direction of intensity change.
The 'last mile' of data handling: Fermilab's IFDH tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lyon, Adam L.; Mengel, Marc W.
2014-01-01
IFDH (Intensity Frontier Data Handling), is a suite of tools for data movement tasks for Fermilab experiments and is an important part of the FIFE[2] (Fabric for Intensity Frontier [1] Experiments) initiative described at this conference. IFDH encompasses moving input data from caches or storage elements to compute nodes (the 'last mile' of data movement) and moving output data potentially to those caches as part of the journey back to the user. IFDH also involves throttling and locking to ensure that large numbers of jobs do not cause data movement bottlenecks. IFDH is realized as an easy to use layermore » that users call in their job scripts (e.g. 'ifdh cp'), hiding the low level data movement tools. One advantage of this layer is that the underlying low level tools can be selected or changed without the need for the user to alter their scripts. Logging and performance monitoring can also be added easily. This system will be presented in detail as well as its impact on the ease of data handling at Fermilab experiments.« less
Influence of bulk turbulence and entrance boundary layer thickness on the curved duct flow field
NASA Technical Reports Server (NTRS)
Crawford, R. A.
1988-01-01
The influence of bulk turbulence and boundary layer thickness on the secondary flow development in a square, 90 degree turning duct was investigated. A three-dimensional laser velocimetry system was utilized to measure the mean and fluctuating components of velocity at six cross-planes in the duct. The results from this investigation, with entrance boundary layer thickness of 20 percent, were compared with the thin boundary layer results documented in NASA CR-174811. The axial velocity profiles, cross-flow velocities, and turbulence intensities were compared and evaluated with regard to the influence of bulk turbulence intensity and boundary layer thickness, and the influence was significant. The results of this investigation expand the 90 degree curved duct experimental data base to higher turbulence levels and thicker entrance boundary layers. The experimental results provide a challenging benchmark data base for computational fluid dynamics code development and validation. The variation of inlet bulk turbulence intensity provides additional information to aid in turbulence model evaluation.
Robust parameter extraction for decision support using multimodal intensive care data
Clifford, G.D.; Long, W.J.; Moody, G.B.; Szolovits, P.
2008-01-01
Digital information flow within the intensive care unit (ICU) continues to grow, with advances in technology and computational biology. Recent developments in the integration and archiving of these data have resulted in new opportunities for data analysis and clinical feedback. New problems associated with ICU databases have also arisen. ICU data are high-dimensional, often sparse, asynchronous and irregularly sampled, as well as being non-stationary, noisy and subject to frequent exogenous perturbations by clinical staff. Relationships between different physiological parameters are usually nonlinear (except within restricted ranges), and the equipment used to measure the observables is often inherently error-prone and biased. The prior probabilities associated with an individual's genetics, pre-existing conditions, lifestyle and ongoing medical treatment all affect prediction and classification accuracy. In this paper, we describe some of the key problems and associated methods that hold promise for robust parameter extraction and data fusion for use in clinical decision support in the ICU. PMID:18936019
Applications of Deep Learning and Reinforcement Learning to Biological Data.
Mahmud, Mufti; Kaiser, Mohammed Shamim; Hussain, Amir; Vassanelli, Stefano
2018-06-01
Rapid advances in hardware-based technologies during the past decades have opened up new possibilities for life scientists to gather multimodal data in various application domains, such as omics, bioimaging, medical imaging, and (brain/body)-machine interfaces. These have generated novel opportunities for development of dedicated data-intensive machine learning techniques. In particular, recent research in deep learning (DL), reinforcement learning (RL), and their combination (deep RL) promise to revolutionize the future of artificial intelligence. The growth in computational power accompanied by faster and increased data storage, and declining computing costs have already allowed scientists in various fields to apply these techniques on data sets that were previously intractable owing to their size and complexity. This paper provides a comprehensive survey on the application of DL, RL, and deep RL techniques in mining biological data. In addition, we compare the performances of DL techniques when applied to different data sets across various application domains. Finally, we outline open issues in this challenging research area and discuss future development perspectives.
Adapting bioinformatics curricula for big data.
Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H
2016-01-01
Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. © The Author 2015. Published by Oxford University Press.
Adapting bioinformatics curricula for big data
Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.
2016-01-01
Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469
NASA Technical Reports Server (NTRS)
Vinci, Samuel, J.
2012-01-01
This report is the third part of a three-part final report of research performed under an NRA cooperative Agreement contract. The first part was published as NASA/CR-2012-217415. The second part was published as NASA/CR-2012-217416. The study of the very high lift low-pressure turbine airfoil L1A in the presence of unsteady wakes was performed computationally and compared against experimental results. The experiments were conducted in a low speed wind tunnel under high (4.9%) and then low (0.6%) freestream turbulence intensity for Reynolds number equal to 25,000 and 50,000. The experimental and computational data have shown that in cases without wakes, the boundary layer separated without reattachment. The CFD was done with LES and URANS utilizing the finite-volume code ANSYS Fluent (ANSYS, Inc.) under the same freestream turbulence and Reynolds number conditions as the experiment but only at a rod to blade spacing of 1. With wakes, separation was largely suppressed, particularly if the wake passing frequency was sufficiently high. This was validated in the 3D CFD efforts by comparing the experimental results for the pressure coefficients and velocity profiles, which were reasonable for all cases examined. The 2D CFD efforts failed to capture the three dimensionality effects of the wake and thus were less consistent with the experimental data. The effect of the freestream turbulence intensity levels also showed a little more consistency with the experimental data at higher intensities when compared with the low intensity cases. Additional cases with higher wake passing frequencies which were not run experimentally were simulated. The results showed that an initial 25% increase from the experimental wake passing greatly reduced the size of the separation bubble, nearly completely suppressing it.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owen, D; Anderson, C; Mayo, C
Purpose: To extend the functionality of a commercial treatment planning system (TPS) to support (i) direct use of quantitative image-based metrics within treatment plan optimization and (ii) evaluation of dose-functional volume relationships to assist in functional image adaptive radiotherapy. Methods: A script was written that interfaces with a commercial TPS via an Application Programming Interface (API). The script executes a program that performs dose-functional volume analyses. Written in C#, the script reads the dose grid and correlates it with image data on a voxel-by-voxel basis through API extensions that can access registration transforms. A user interface was designed through WinFormsmore » to input parameters and display results. To test the performance of this program, image- and dose-based metrics computed from perfusion SPECT images aligned to the treatment planning CT were generated, validated, and compared. Results: The integration of image analysis information was successfully implemented as a plug-in to a commercial TPS. Perfusion SPECT images were used to validate the calculation and display of image-based metrics as well as dose-intensity metrics and histograms for defined structures on the treatment planning CT. Various biological dose correction models, custom image-based metrics, dose-intensity computations, and dose-intensity histograms were applied to analyze the image-dose profile. Conclusion: It is possible to add image analysis features to commercial TPSs through custom scripting applications. A tool was developed to enable the evaluation of image-intensity-based metrics in the context of functional targeting and avoidance. In addition to providing dose-intensity metrics and histograms that can be easily extracted from a plan database and correlated with outcomes, the system can also be extended to a plug-in optimization system, which can directly use the computed metrics for optimization of post-treatment tumor or normal tissue response models. Supported by NIH - P01 - CA059827.« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
NASA Astrophysics Data System (ADS)
Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.
2015-05-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
Fast Poisson noise removal by biorthogonal Haar domain hypothesis testing
NASA Astrophysics Data System (ADS)
Zhang, B.; Fadili, M. J.; Starck, J.-L.; Digel, S. W.
2008-07-01
Methods based on hypothesis tests (HTs) in the Haar domain are widely used to denoise Poisson count data. Facing large datasets or real-time applications, Haar-based denoisers have to use the decimated transform to meet limited-memory or computation-time constraints. Unfortunately, for regular underlying intensities, decimation yields discontinuous estimates and strong “staircase” artifacts. In this paper, we propose to combine the HT framework with the decimated biorthogonal Haar (Bi-Haar) transform instead of the classical Haar. The Bi-Haar filter bank is normalized such that the p-values of Bi-Haar coefficients (p) provide good approximation to those of Haar (pH) for high-intensity settings or large scales; for low-intensity settings and small scales, we show that p are essentially upper-bounded by pH. Thus, we may apply the Haar-based HTs to Bi-Haar coefficients to control a prefixed false positive rate. By doing so, we benefit from the regular Bi-Haar filter bank to gain a smooth estimate while always maintaining a low computational complexity. A Fisher-approximation-based threshold implementing the HTs is also established. The efficiency of this method is illustrated on an example of hyperspectral-source-flux estimation.
Tivnan, Matthew; Gurjar, Rajan; Wolf, David E; Vishwanath, Karthik
2015-08-12
Diffuse Correlation Spectroscopy (DCS) is a well-established optical technique that has been used for non-invasive measurement of blood flow in tissues. Instrumentation for DCS includes a correlation device that computes the temporal intensity autocorrelation of a coherent laser source after it has undergone diffuse scattering through a turbid medium. Typically, the signal acquisition and its autocorrelation are performed by a correlation board. These boards have dedicated hardware to acquire and compute intensity autocorrelations of rapidly varying input signal and usually are quite expensive. Here we show that a Raspberry Pi minicomputer can acquire and store a rapidly varying time-signal with high fidelity. We show that this signal collected by a Raspberry Pi device can be processed numerically to yield intensity autocorrelations well suited for DCS applications. DCS measurements made using the Raspberry Pi device were compared to those acquired using a commercial hardware autocorrelation board to investigate the stability, performance, and accuracy of the data acquired in controlled experiments. This paper represents a first step toward lowering the instrumentation cost of a DCS system and may offer the potential to make DCS become more widely used in biomedical applications.
Tivnan, Matthew; Gurjar, Rajan; Wolf, David E.; Vishwanath, Karthik
2015-01-01
Diffuse Correlation Spectroscopy (DCS) is a well-established optical technique that has been used for non-invasive measurement of blood flow in tissues. Instrumentation for DCS includes a correlation device that computes the temporal intensity autocorrelation of a coherent laser source after it has undergone diffuse scattering through a turbid medium. Typically, the signal acquisition and its autocorrelation are performed by a correlation board. These boards have dedicated hardware to acquire and compute intensity autocorrelations of rapidly varying input signal and usually are quite expensive. Here we show that a Raspberry Pi minicomputer can acquire and store a rapidly varying time-signal with high fidelity. We show that this signal collected by a Raspberry Pi device can be processed numerically to yield intensity autocorrelations well suited for DCS applications. DCS measurements made using the Raspberry Pi device were compared to those acquired using a commercial hardware autocorrelation board to investigate the stability, performance, and accuracy of the data acquired in controlled experiments. This paper represents a first step toward lowering the instrumentation cost of a DCS system and may offer the potential to make DCS become more widely used in biomedical applications. PMID:26274961
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Secure data exchange between intelligent devices and computing centers
NASA Astrophysics Data System (ADS)
Naqvi, Syed; Riguidel, Michel
2005-03-01
The advent of reliable spontaneous networking technologies (commonly known as wireless ad-hoc networks) has ostensibly raised stakes for the conception of computing intensive environments using intelligent devices as their interface with the external world. These smart devices are used as data gateways for the computing units. These devices are employed in highly volatile environments where the secure exchange of data between these devices and their computing centers is of paramount importance. Moreover, their mission critical applications require dependable measures against the attacks like denial of service (DoS), eavesdropping, masquerading, etc. In this paper, we propose a mechanism to assure reliable data exchange between an intelligent environment composed of smart devices and distributed computing units collectively called 'computational grid'. The notion of infosphere is used to define a digital space made up of a persistent and a volatile asset in an often indefinite geographical space. We study different infospheres and present general evolutions and issues in the security of such technology-rich and intelligent environments. It is beyond any doubt that these environments will likely face a proliferation of users, applications, networked devices, and their interactions on a scale never experienced before. It would be better to build in the ability to uniformly deal with these systems. As a solution, we propose a concept of virtualization of security services. We try to solve the difficult problems of implementation and maintenance of trust on the one hand, and those of security management in heterogeneous infrastructure on the other hand.
An X-ray diffraction method for semiquantitative mineralogical analysis of Chilean nitrate ore
Jackson, J.C.; Ericksent, G.E.
1997-01-01
Computer analysis of X-ray diffraction (XRD) data provides a simple method for determining the semiquantitative mineralogical composition of naturally occurring mixtures of saline minerals. The method herein described was adapted from a computer program for the study of mixtures of naturally occurring clay minerals. The program evaluates the relative intensities of selected diagnostic peaks for the minerals in a given mixture, and then calculates the relative concentrations of these minerals. The method requires precise calibration of XRD data for the minerals to be studied and selection of diffraction peaks that minimize inter-compound interferences. The calculated relative abundances are sufficiently accurate for direct comparison with bulk chemical analyses of naturally occurring saline mineral assemblages.
An x-ray diffraction method for semiquantitative mineralogical analysis of chilean nitrate ore
John, C.; George, J.; Ericksen, E.
1997-01-01
Computer analysis of X-ray diffraction (XRD) data provides a simple method for determining the semiquantitative mineralogical composition of naturally occurring mixtures of saline minerals. The method herein described was adapted from a computer program for the study of mixtures of naturally occurring clay minerals. The program evaluates the relative intensities of selected diagnostic peaks for the minerals in a given mixture, and then calculates the relative concentrations of these minerals. The method requires precise calibration of XRD data for the minerals to be studied and selection of diffraction peaks that minimize inter-compound interferences. The calculated relative abundances are sufficiently accurate for direct comparison with bulk chemical analyses of naturally occurring saline mineral assemblages.
OʼHara, Susan
2014-01-01
Nurses have increasingly been regarded as critical members of the planning team as architects recognize their knowledge and value. But the nurses' role as knowledge experts can be expanded to leading efforts to integrate the clinical, operational, and architectural expertise through simulation modeling. Simulation modeling allows for the optimal merge of multifactorial data to understand the current state of the intensive care unit and predict future states. Nurses can champion the simulation modeling process and reap the benefits of a cost-effective way to test new designs, processes, staffing models, and future programming trends prior to implementation. Simulation modeling is an evidence-based planning approach, a standard, for integrating the sciences with real client data, to offer solutions for improving patient care.
Fatigue crack growth in 2024-T3 aluminum under tensile and transverse shear stresses
NASA Technical Reports Server (NTRS)
Viz, Mark J.; Zehnder, Alan T.
1994-01-01
The influence of transverse shear stresses on the fatigue crack growth rate in thin 2024-T3 aluminum alloy sheets is investigated experimentally. The tests are performed on double-edge cracked sheets in cyclic tensile and torsional loading. This loading generates crack tip stress intensity factors in the same ratio as the values computed for a crack lying along a lap joint in a pressurized aircraft fuselage. The relevant fracture mechanics of cracks in thin plates along with the details of the geometrically nonlinear finite element analyses used for the test specimen calibration are developed and discussed. Preliminary fatigue crack growth data correlated using the fully coupled stress intensity factor calibration are presented and compared with fatigue crack growth data from pure delta K(sub I)fatigue tests.
Segmentation and intensity estimation of microarray images using a gamma-t mixture model.
Baek, Jangsun; Son, Young Sook; McLachlan, Geoffrey J
2007-02-15
We present a new approach to the analysis of images for complementary DNA microarray experiments. The image segmentation and intensity estimation are performed simultaneously by adopting a two-component mixture model. One component of this mixture corresponds to the distribution of the background intensity, while the other corresponds to the distribution of the foreground intensity. The intensity measurement is a bivariate vector consisting of red and green intensities. The background intensity component is modeled by the bivariate gamma distribution, whose marginal densities for the red and green intensities are independent three-parameter gamma distributions with different parameters. The foreground intensity component is taken to be the bivariate t distribution, with the constraint that the mean of the foreground is greater than that of the background for each of the two colors. The degrees of freedom of this t distribution are inferred from the data but they could be specified in advance to reduce the computation time. Also, the covariance matrix is not restricted to being diagonal and so it allows for nonzero correlation between R and G foreground intensities. This gamma-t mixture model is fitted by maximum likelihood via the EM algorithm. A final step is executed whereby nonparametric (kernel) smoothing is undertaken of the posterior probabilities of component membership. The main advantages of this approach are: (1) it enjoys the well-known strengths of a mixture model, namely flexibility and adaptability to the data; (2) it considers the segmentation and intensity simultaneously and not separately as in commonly used existing software, and it also works with the red and green intensities in a bivariate framework as opposed to their separate estimation via univariate methods; (3) the use of the three-parameter gamma distribution for the background red and green intensities provides a much better fit than the normal (log normal) or t distributions; (4) the use of the bivariate t distribution for the foreground intensity provides a model that is less sensitive to extreme observations; (5) as a consequence of the aforementioned properties, it allows segmentation to be undertaken for a wide range of spot shapes, including doughnut, sickle shape and artifacts. We apply our method for gridding, segmentation and estimation to cDNA microarray real images and artificial data. Our method provides better segmentation results in spot shapes as well as intensity estimation than Spot and spotSegmentation R language softwares. It detected blank spots as well as bright artifact for the real data, and estimated spot intensities with high-accuracy for the synthetic data. The algorithms were implemented in Matlab. The Matlab codes implementing both the gridding and segmentation/estimation are available upon request. Supplementary material is available at Bioinformatics online.
A Statistician's View of Upcoming Grand Challenges
NASA Astrophysics Data System (ADS)
Meng, Xiao Li
2010-01-01
In this session we have seen some snapshots of the broad spectrum of challenges, in this age of huge, complex, computer-intensive models, data, instruments,and questions. These challenges bridge astronomy at many wavelengths; basic physics; machine learning; -- and statistics. At one end of our spectrum, we think of 'compressing' the data with non-parametric methods. This raises the question of creating 'pseudo-replicas' of the data for uncertainty estimates. What would be involved in, e.g. boot-strap and related methods? Somewhere in the middle are these non-parametric methods for encapsulating the uncertainty information. At the far end, we find more model-based approaches, with the physics model embedded in the likelihood and analysis. The other distinctive problem is really the 'black-box' problem, where one has a complicated e.g. fundamental physics-based computer code, or 'black box', and one needs to know how changing the parameters at input -- due to uncertainties of any kind -- will map to changing the output. All of these connect to challenges in complexity of data and computation speed. Dr. Meng will highlight ways to 'cut corners' with advanced computational techniques, such as Parallel Tempering and Equal Energy methods. As well, there are cautionary tales of running automated analysis with real data -- where "30 sigma" outliers due to data artifacts can be more common than the astrophysical event of interest.
Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.
Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence
2012-12-01
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.
Heterogeneous real-time computing in radio astronomy
NASA Astrophysics Data System (ADS)
Ford, John M.; Demorest, Paul; Ransom, Scott
2010-07-01
Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
Ge, Hong-You; Vangsgaard, Steffen; Omland, Øyvind; Madeleine, Pascal; Arendt-Nielsen, Lars
2014-12-06
Musculoskeletal pain from the upper extremity and shoulder region is commonly reported by computer users. However, the functional status of central pain mechanisms, i.e., central sensitization and conditioned pain modulation (CPM), has not been investigated in this population. The aim was to evaluate sensitization and CPM in computer users with and without chronic musculoskeletal pain. Pressure pain threshold (PPT) mapping in the neck-shoulder (15 points) and the elbow (12 points) was assessed together with PPT measurement at mid-point in the tibialis anterior (TA) muscle among 47 computer users with chronic pain in the upper extremity and/or neck-shoulder pain (pain group) and 17 pain-free computer users (control group). Induced pain intensities and profiles over time were recorded using a 0-10 cm electronic visual analogue scale (VAS) in response to different levels of pressure stimuli on the forearm with a new technique of dynamic pressure algometry. The efficiency of CPM was assessed using cuff-induced pain as conditioning pain stimulus and PPT at TA as test stimulus. The demographics, job seniority and number of working hours/week using a computer were similar between groups. The PPTs measured at all 15 points in the neck-shoulder region were not significantly different between groups. There were no significant differences between groups neither in PPTs nor pain intensity induced by dynamic pressure algometry. No significant difference in PPT was observed in TA between groups. During CPM, a significant increase in PPT at TA was observed in both groups (P < 0.05) without significant differences between groups. For the chronic pain group, higher clinical pain intensity, lower PPT values from the neck-shoulder and higher pain intensity evoked by the roller were all correlated with less efficient descending pain modulation (P < 0.05). This suggests that the excitability of the central pain system is normal in a large group of computer users with low pain intensity chronic upper extremity and/or neck-shoulder pain and that increased excitability of the pain system cannot explain the reported pain. However, computer users with higher pain intensity and lower PPTs were found to have decreased efficiency in descending pain modulation.
Atmospheric simulation using a liquid crystal wavefront-controlling device
NASA Astrophysics Data System (ADS)
Brooks, Matthew R.; Goda, Matthew E.
2004-10-01
Test and evaluation of laser warning devices is important due to the increased use of laser devices in aerial applications. This research consists of an atmospheric aberrating system to enable in-lab testing of various detectors and sensors. This system employs laser light at 632.8nm from a Helium-Neon source and a spatial light modulator (SLM) to cause phase changes using a birefringent liquid crystal material. Measuring outgoing radiation from the SLM using a CCD targetboard and Shack-Hartmann wavefront sensor reveals an acceptable resemblance of system output to expected atmospheric theory. Over three turbulence scenarios, an error analysis reveals that turbulence data matches theory. A wave optics computer simulation is created analogous to the lab-bench design. Phase data, intensity data, and a computer simulation affirm lab-bench results so that the aberrating SLM system can be operated confidently.
Restoration of MRI Data for Field Nonuniformities using High Order Neighborhood Statistics
Hadjidemetriou, Stathis; Studholme, Colin; Mueller, Susanne; Weiner, Michael; Schuff, Norbert
2007-01-01
MRI at high magnetic fields (> 3.0 T ) is complicated by strong inhomogeneous radio-frequency fields, sometimes termed the “bias field”. These lead to nonuniformity of image intensity, greatly complicating further analysis such as registration and segmentation. Existing methods for bias field correction are effective for 1.5 T or 3.0 T MRI, but are not completely satisfactory for higher field data. This paper develops an effective bias field correction for high field MRI based on the assumption that the nonuniformity is smoothly varying in space. Also, nonuniformity is quantified and unmixed using high order neighborhood statistics of intensity cooccurrences. They are computed within spherical windows of limited size over the entire image. The restoration is iterative and makes use of a novel stable stopping criterion that depends on the scaled entropy of the cooccurrence statistics, which is a non monotonic function of the iterations; the Shannon entropy of the cooccurrence statistics normalized to the effective dynamic range of the image. The algorithm restores whole head data, is robust to intense nonuniformities present in high field acquisitions, and is robust to variations in anatomy. This algorithm significantly improves bias field correction in comparison to N3 on phantom 1.5 T head data and high field 4 T human head data. PMID:18193095
A scientific workflow framework for (13)C metabolic flux analysis.
Dalman, Tolga; Wiechert, Wolfgang; Nöh, Katharina
2016-08-20
Metabolic flux analysis (MFA) with (13)C labeling data is a high-precision technique to quantify intracellular reaction rates (fluxes). One of the major challenges of (13)C MFA is the interactivity of the computational workflow according to which the fluxes are determined from the input data (metabolic network model, labeling data, and physiological rates). Here, the workflow assembly is inevitably determined by the scientist who has to consider interacting biological, experimental, and computational aspects. Decision-making is context dependent and requires expertise, rendering an automated evaluation process hardly possible. Here, we present a scientific workflow framework (SWF) for creating, executing, and controlling on demand (13)C MFA workflows. (13)C MFA-specific tools and libraries, such as the high-performance simulation toolbox 13CFLUX2, are wrapped as web services and thereby integrated into a service-oriented architecture. Besides workflow steering, the SWF features transparent provenance collection and enables full flexibility for ad hoc scripting solutions. To handle compute-intensive tasks, cloud computing is supported. We demonstrate how the challenges posed by (13)C MFA workflows can be solved with our approach on the basis of two proof-of-concept use cases. Copyright © 2015 Elsevier B.V. All rights reserved.
Visual analysis of fluid dynamics at NASA's numerical aerodynamic simulation facility
NASA Technical Reports Server (NTRS)
Watson, Velvin R.
1991-01-01
A study aimed at describing and illustrating visualization tools used in Computational Fluid Dynamics (CFD) and indicating how these tools are likely to change by showing a projected resolution of the human computer interface is presented. The following are outlined using a graphically based test format: the revolution of human computer environments for CFD research; comparison of current environments; current environments with the ideal; predictions for the future CFD environments; what can be done to accelerate the improvements. The following comments are given: when acquiring visualization tools, potential rapid changes must be considered; environmental changes over the next ten years due to human computer interface cannot be fathomed; data flow packages such as AVS, apE, Explorer and Data Explorer are easy to learn and use for small problems, excellent for prototyping, but not so efficient for large problems; the approximation techniques used in visualization software must be appropriate for the data; it has become more cost effective to move jobs that fit on workstations and run only memory intensive jobs on the supercomputer; use of three dimensional skills will be maximized when the three dimensional environment is built in from the start.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
Computational Omics Funding Opportunity | Office of Cancer Clinical Proteomics Research
The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) and the NVIDIA Foundation are pleased to announce funding opportunities in the fight against cancer. Each organization has launched a request for proposals (RFP) that will collectively fund up to $2 million to help to develop a new generation of data-intensive scientific tools to find new ways to treat cancer.
Müller, Anne D; Artemyev, Anton N; Demekhin, Philipp V
2018-06-07
Angle-resolved multiphoton ionization of fenchone and camphor by short intense laser pulses is computed by the time-dependent single center method. Thereby, the photoelectron circular dichroism (PECD) in the three-photon resonance enhanced ionization and four-photon above-threshold ionization of these molecules is investigated in detail. The computational results are in satisfactory agreement with the available experimental data, measured for randomly oriented fenchone and camphor molecules at different wavelengths of the exciting pulses. We predict a significant enhancement of the multiphoton PECD for uniaxially oriented fenchone and camphor.
NASA Astrophysics Data System (ADS)
Müller, Anne D.; Artemyev, Anton N.; Demekhin, Philipp V.
2018-06-01
Angle-resolved multiphoton ionization of fenchone and camphor by short intense laser pulses is computed by the time-dependent single center method. Thereby, the photoelectron circular dichroism (PECD) in the three-photon resonance enhanced ionization and four-photon above-threshold ionization of these molecules is investigated in detail. The computational results are in satisfactory agreement with the available experimental data, measured for randomly oriented fenchone and camphor molecules at different wavelengths of the exciting pulses. We predict a significant enhancement of the multiphoton PECD for uniaxially oriented fenchone and camphor.
Quadruple Axis Neutron Computed Tomography
NASA Astrophysics Data System (ADS)
Schillinger, Burkhard; Bausenwein, Dominik
Neutron computed tomography takes more time for a full tomography than X-rays or Synchrotron radiation, because the source intensity is limited. Most neutron imaging detectors have a square field of view, so if tomography of elongated, narrow samples, e.g. fuel rods, sword blades is recorded, much of the detector area is wasted. Using multiple rotation axes, several samples can be placed inside the field of view, and multiple tomographies can be recorded at the same time by later splitting the recorded images into separate tomography data sets. We describe a new multiple-axis setup using four independent miniaturized rotation tables.
Hunter, James; Freer, Yvonne; Gatt, Albert; Reiter, Ehud; Sripada, Somayajulu; Sykes, Cindy
2012-11-01
Our objective was to determine whether and how a computer system could automatically generate helpful natural language nursing shift summaries solely from an electronic patient record system, in a neonatal intensive care unit (NICU). A system was developed which automatically generates partial NICU shift summaries (for the respiratory and cardiovascular systems), using data-to-text technology. It was evaluated for 2 months in the NICU at the Royal Infirmary of Edinburgh, under supervision. In an on-ward evaluation, a substantial majority of the summaries was found by outgoing and incoming nurses to be understandable (90%), and a majority was found to be accurate (70%), and helpful (59%). The evaluation also served to identify some outstanding issues, especially with regard to extra content the nurses wanted to see in the computer-generated summaries. It is technically possible automatically to generate limited natural language NICU shift summaries from an electronic patient record. However, it proved difficult to handle electronic data that was intended primarily for display to the medical staff, and considerable engineering effort would be required to create a deployable system from our proof-of-concept software. Copyright © 2012 Elsevier B.V. All rights reserved.
Ganalyzer: A Tool for Automatic Galaxy Image Analysis
NASA Astrophysics Data System (ADS)
Shamir, Lior
2011-08-01
We describe Ganalyzer, a model-based tool that can automatically analyze and classify galaxy images. Ganalyzer works by separating the galaxy pixels from the background pixels, finding the center and radius of the galaxy, generating the radial intensity plot, and then computing the slopes of the peaks detected in the radial intensity plot to measure the spirality of the galaxy and determine its morphological class. Unlike algorithms that are based on machine learning, Ganalyzer is based on measuring the spirality of the galaxy, a task that is difficult to perform manually, and in many cases can provide a more accurate analysis compared to manual observation. Ganalyzer is simple to use, and can be easily embedded into other image analysis applications. Another advantage is its speed, which allows it to analyze ~10,000,000 galaxy images in five days using a standard modern desktop computer. These capabilities can make Ganalyzer a useful tool in analyzing large data sets of galaxy images collected by autonomous sky surveys such as SDSS, LSST, or DES. The software is available for free download at http://vfacstaff.ltu.edu/lshamir/downloads/ganalyzer, and the data used in the experiment are available at http://vfacstaff.ltu.edu/lshamir/downloads/ganalyzer/GalaxyImages.zip.
Application of ERTS-1 data to the protection and management of New Jersey's coastal environment
NASA Technical Reports Server (NTRS)
Yunghans, R. S.; Feinberg, E. B.; Wobber, F. J.; Mairs, R. L. (Principal Investigator); Macomber, R. T.; Stanczuk, D.; Stitt, J. A.
1974-01-01
The author has identified the following significant results. Rapid access to ERTS data was provided by NASA GSFC for the February 26, 1974 overpass of the New Jersey test site. Forty-seven hours following the overpass computer-compatible tapes were ready for processing at EarthSat. The finished product was ready just 60 hours following the overpass and delivered to the New Jersey Department of Environmental Protection. This operational demonstration has been successful in convincing NJDEP as to the worth of ERTS as an operational monitoring and enforcement tool of significant value to the State. An erosion/ accretion severity index has been developed for the New Jersey shore case study area. Computerized analysis techniques have been used for monitoring offshore waste disposal dumping locations, drift vectors, and dispersion rates in the New York Bight area. A computer shade print of the area was used to identify intensity levels of acid waste. A Litton intensity slice print was made to provide graphic presentation of dispersion characteristics and the dump extent. Continued monitoring will lead to the recommendation and justification of permanent dumping sites which pose no threat to water quality in nearshore environments.
Feature and Intensity Based Medical Image Registration Using Particle Swarm Optimization.
Abdel-Basset, Mohamed; Fakhry, Ahmed E; El-Henawy, Ibrahim; Qiu, Tie; Sangaiah, Arun Kumar
2017-11-03
Image registration is an important aspect in medical image analysis, and kinds use in a variety of medical applications. Examples include diagnosis, pre/post surgery guidance, comparing/merging/integrating images from multi-modal like Magnetic Resonance Imaging (MRI), and Computed Tomography (CT). Whether registering images across modalities for a single patient or registering across patients for a single modality, registration is an effective way to combine information from different images into a normalized frame for reference. Registered datasets can be used for providing information relating to the structure, function, and pathology of the organ or individual being imaged. In this paper a hybrid approach for medical images registration has been developed. It employs a modified Mutual Information (MI) as a similarity metric and Particle Swarm Optimization (PSO) method. Computation of mutual information is modified using a weighted linear combination of image intensity and image gradient vector flow (GVF) intensity. In this manner, statistical as well as spatial image information is included into the image registration process. Maximization of the modified mutual information is effected using the versatile Particle Swarm Optimization which is developed easily with adjusted less parameter. The developed approach has been tested and verified successfully on a number of medical image data sets that include images with missing parts, noise contamination, and/or of different modalities (CT, MRI). The registration results indicate the proposed model as accurate and effective, and show the posture contribution in inclusion of both statistical and spatial image data to the developed approach.
Estimating Function Approaches for Spatial Point Processes
NASA Astrophysics Data System (ADS)
Deng, Chong
Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting second-order intensity function of spatial point processes. However, the original second-order quasi-likelihood is barely feasible due to the intense computation and high memory requirement needed to solve a large linear system. Motivated by the existence of geometric regular patterns in the stationary point processes, we find a lower dimension representation of the optimal weight function and propose a reduced second-order quasi-likelihood approach. Through a simulation study, we show that the proposed method not only demonstrates superior performance in fitting the clustering parameter but also merits in the relaxation of the constraint of the tuning parameter, H. Third, we studied the quasi-likelihood type estimating funciton that is optimal in a certain class of first-order estimating functions for estimating the regression parameter in spatial point process models. Then, by using a novel spectral representation, we construct an implementation that is computationally much more efficient and can be applied to more general setup than the original quasi-likelihood method.
On the importance of mathematical methods for analysis of MALDI-imaging mass spectrometry data.
Trede, Dennis; Kobarg, Jan Hendrik; Oetjen, Janina; Thiele, Herbert; Maass, Peter; Alexandrov, Theodore
2012-03-21
In the last decade, matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS), also called as MALDI-imaging, has proven its potential in proteomics and was successfully applied to various types of biomedical problems, in particular to histopathological label-free analysis of tissue sections. In histopathology, MALDI-imaging is used as a general analytic tool revealing the functional proteomic structure of tissue sections, and as a discovery tool for detecting new biomarkers discriminating a region annotated by an experienced histologist, in particular, for cancer studies. A typical MALDI-imaging data set contains 10⁸ to 10⁹ intensity values occupying more than 1 GB. Analysis and interpretation of such huge amount of data is a mathematically, statistically and computationally challenging problem. In this paper we overview some computational methods for analysis of MALDI-imaging data sets. We discuss the importance of data preprocessing, which typically includes normalization, baseline removal and peak picking, and hightlight the importance of image denoising when visualizing IMS data.
On the Importance of Mathematical Methods for Analysis of MALDI-Imaging Mass Spectrometry Data.
Trede, Dennis; Kobarg, Jan Hendrik; Oetjen, Janina; Thiele, Herbert; Maass, Peter; Alexandrov, Theodore
2012-03-01
In the last decade, matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS), also called as MALDI-imaging, has proven its potential in proteomics and was successfully applied to various types of biomedical problems, in particular to histopathological label-free analysis of tissue sections. In histopathology, MALDI-imaging is used as a general analytic tool revealing the functional proteomic structure of tissue sections, and as a discovery tool for detecting new biomarkers discriminating a region annotated by an experienced histologist, in particular, for cancer studies. A typical MALDI-imaging data set contains 108 to 109 intensity values occupying more than 1 GB. Analysis and interpretation of such huge amount of data is a mathematically, statistically and computationally challenging problem. In this paper we overview some computational methods for analysis of MALDI-imaging data sets. We discuss the importance of data preprocessing, which typically includes normalization, baseline removal and peak picking, and hightlight the importance of image denoising when visualizing IMS data.
a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images
NASA Astrophysics Data System (ADS)
Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.
2015-07-01
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.
2017-10-01
PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
Differences in muscle load between computer and non-computer work among office workers.
Richter, J M; Mathiassen, S E; Slijper, H P; Over, E A B; Frens, M A
2009-12-01
Introduction of more non-computer tasks has been suggested to increase exposure variation and thus reduce musculoskeletal complaints (MSC) in computer-intensive office work. This study investigated whether muscle activity did, indeed, differ between computer and non-computer activities. Whole-day logs of input device use in 30 office workers were used to identify computer and non-computer work, using a range of classification thresholds (non-computer thresholds (NCTs)). Exposure during these activities was assessed by bilateral electromyography recordings from the upper trapezius and lower arm. Contrasts in muscle activity between computer and non-computer work were distinct but small, even at the individualised, optimal NCT. Using an average group-based NCT resulted in less contrast, even in smaller subgroups defined by job function or MSC. Thus, computer activity logs should be used cautiously as proxies of biomechanical exposure. Conventional non-computer tasks may have a limited potential to increase variation in muscle activity during computer-intensive office work.
The Application Design of Solar Radio Spectrometer Based on FPGA
NASA Astrophysics Data System (ADS)
Du, Q. F.; Chen, R. J.; Zhao, Y. C.; Feng, S. W.; Chen, Y.; Song, Y.
2017-10-01
The Solar radio spectrometer is the key instrument to observe solar radio. By programing the computer software, we control the AD signal acquisition card which is based on FPGA to get a mass of data. The data are transferred by using PCI-E port. This program has realized the function of timing data collection, finding data in specific time and controlling acquisition meter in real time. It can also map the solar radio power intensity graph. By doing the experiment, we verify the reliability of solar radio spectrum instrument, in the meanwhile, the instrument simplifies the operation in observing the sun.
Methods in Astronomical Image Processing
NASA Astrophysics Data System (ADS)
Jörsäter, S.
A Brief Introductory Note History of Astronomical Imaging Astronomical Image Data Images in Various Formats Digitized Image Data Digital Image Data Philosophy of Astronomical Image Processing Properties of Digital Astronomical Images Human Image Processing Astronomical vs. Computer Science Image Processing Basic Tools of Astronomical Image Processing Display Applications Calibration of Intensity Scales Calibration of Length Scales Image Re-shaping Feature Enhancement Noise Suppression Noise and Error Analysis Image Processing Packages: Design of AIPS and MIDAS AIPS MIDAS Reduction of CCD Data Bias Subtraction Clipping Preflash Subtraction Dark Subtraction Flat Fielding Sky Subtraction Extinction Correction Deconvolution Methods Rebinning/Combining Summary and Prospects for the Future
Aberdeen polygons: computer displays of physiological profiles for intensive care.
Green, C A; Logie, R H; Gilhooly, K J; Ross, D G; Ronald, A
1996-03-01
The clinician in an intensive therapy unit is presented regularly with a range of information about the current physiological state of the patients under care. This information typically comes from a variety of sources and in a variety of formats. A more integrated form of display incorporating several physiological parameters may be helpful therefore. Three experiments are reported that explored the potential use of analogue, polygon diagrams to display physiological data from patients undergoing intensive therapy. Experiment 1 demonstrated that information can be extracted readily from such diagrams comprising 8- or 10-sided polygons, but with an advantage for simpler polygons and for information displayed at the top of the diagram. Experiment 2 showed that colour coding removed these biases for simpler polygons and the top of the diagram, together with speeding the processing time. Experiment 3 used polygons displaying patterns of physiological data that were consistent with typical conditions observed in the intensive care unit. It was found that physicians can readily learn to recognize these patterns and to diagnose both the nature and severity of the patient's physiological state. These polygon diagrams appear to have some considerable potential for use in providing on-line summary information of a patient's physiological state.
Ng, C M
2013-10-01
The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU-CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU-CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.