A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCaskey, Alexander J.
Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O. (Editor); Housner, Jerrold M. (Editor)
1993-01-01
Computing speed is leaping forward by several orders of magnitude each decade. Engineers and scientists gathered at a NASA Langley symposium to discuss these exciting trends as they apply to parallel computational methods for large-scale structural analysis and design. Among the topics discussed were: large-scale static analysis; dynamic, transient, and thermal analysis; domain decomposition (substructuring); and nonlinear and numerical methods.
Human computers: the first pioneers of the information age.
Grier, D A
2001-03-01
Before computers were machines, they were people. They were men and women, young and old, well educated and common. They were the workers who convinced scientists that large-scale calculation had value. Long before Presper Eckert and John Mauchly built the ENIAC at the Moore School of Electronics, Philadelphia, or Maurice Wilkes designed the EDSAC for Manchester University, human computers had created the discipline of computation. They developed numerical methodologies and proved them on practical problems. These human computers were not savants or calculating geniuses. Some knew little more than basic arithmetic. A few were near equals of the scientists they served and, in a different time or place, might have become practicing scientists had they not been barred from a scientific career by their class, education, gender or ethnicity.
Four Argonne National Laboratory scientists receive Early Career Research
Media Contacts Social Media Photos Videos Fact Sheets, Brochures and Reports Summer Science Writing Writing Internship Four Argonne National Laboratory scientists receive Early Career Research Program economic impact of cascading shortages. He will also seek to enable scaling on high-performance computing
Enabling Wide-Scale Computer Science Education through Improved Automated Assessment Tools
NASA Astrophysics Data System (ADS)
Boe, Bryce A.
There is a proliferating demand for newly trained computer scientists as the number of computer science related jobs continues to increase. University programs will only be able to train enough new computer scientists to meet this demand when two things happen: when there are more primary and secondary school students interested in computer science, and when university departments have the resources to handle the resulting increase in enrollment. To meet these goals, significant effort is being made to both incorporate computational thinking into existing primary school education, and to support larger university computer science class sizes. We contribute to this effort through the creation and use of improved automated assessment tools. To enable wide-scale computer science education we do two things. First, we create a framework called Hairball to support the static analysis of Scratch programs targeted for fourth, fifth, and sixth grade students. Scratch is a popular building-block language utilized to pique interest in and teach the basics of computer science. We observe that Hairball allows for rapid curriculum alterations and thus contributes to wide-scale deployment of computer science curriculum. Second, we create a real-time feedback and assessment system utilized in university computer science classes to provide better feedback to students while reducing assessment time. Insights from our analysis of student submission data show that modifications to the system configuration support the way students learn and progress through course material, making it possible for instructors to tailor assignments to optimize learning in growing computer science classes.
NASA Technical Reports Server (NTRS)
Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)
2000-01-01
The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.
PARVMEC: An Efficient, Scalable Implementation of the Variational Moments Equilibrium Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seal, Sudip K; Hirshman, Steven Paul; Wingen, Andreas
The ability to sustain magnetically confined plasma in a state of stable equilibrium is crucial for optimal and cost-effective operations of fusion devices like tokamaks and stellarators. The Variational Moments Equilibrium Code (VMEC) is the de-facto serial application used by fusion scientists to compute magnetohydrodynamics (MHD) equilibria and study the physics of three dimensional plasmas in confined configurations. Modern fusion energy experiments have larger system scales with more interactive experimental workflows, both demanding faster analysis turnaround times on computational workloads that are stressing the capabilities of sequential VMEC. In this paper, we present PARVMEC, an efficient, parallel version of itsmore » sequential counterpart, capable of scaling to thousands of processors on distributed memory machines. PARVMEC is a non-linear code, with multiple numerical physics modules, each with its own computational complexity. A detailed speedup analysis supported by scaling results on 1,024 cores of a Cray XC30 supercomputer is presented. Depending on the mode of PARVMEC execution, speedup improvements of one to two orders of magnitude are reported. PARVMEC equips fusion scientists for the first time with a state-of-theart capability for rapid, high fidelity analyses of magnetically confined plasmas at unprecedented scales.« less
Triangle Computer Science Distinguished Lecture Series
2018-01-30
scientific inquiry - the cell, the brain, the market - as well as in the models developed by scientists over the centuries for studying them. Human...the great objects of scientific inquiry - the cell, the brain, the market - as well as in the models developed by scientists over the centuries for...in principle , secure system operation can be achieved. Massive-Scale Streaming Analytics David Bader, Georgia Institute of Technology (telecast from
Short-Pulse Laser-Matter Computational Workshop Proceedings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Town, R; Tabak, M
For three days at the end of August 2004, 55 plasma scientists met at the Four Points by Sheraton in Pleasanton to discuss some of the critical issues associated with the computational aspects of the interaction of short-pulse high-intensity lasers with matter. The workshop was organized around the following six key areas: (1) Laser propagation/interaction through various density plasmas: micro scale; (2) Anomalous electron transport effects: From micro to meso scale; (3) Electron transport through plasmas: From meso to macro scale; (4) Ion beam generation, transport, and focusing; (5) ''Atomic-scale'' electron and proton stopping powers; and (6) K{alpha} diagnostics.
Knowledge Discovery from Climate Data using Graph-Based Methods
NASA Astrophysics Data System (ADS)
Steinhaeuser, K.
2012-04-01
Climate and Earth sciences have recently experienced a rapid transformation from a historically data-poor to a data-rich environment, thus bringing them into the realm of the Fourth Paradigm of scientific discovery - a term coined by the late Jim Gray (Hey et al. 2009), the other three being theory, experimentation and computer simulation. In particular, climate-related observations from remote sensors on satellites and weather radars, in situ sensors and sensor networks, as well as outputs of climate or Earth system models from large-scale simulations, provide terabytes of spatio-temporal data. These massive and information-rich datasets offer a significant opportunity for advancing climate science and our understanding of the global climate system, yet current analysis techniques are not able to fully realize their potential benefits. We describe a class of computational approaches, specifically from the data mining and machine learning domains, which may be novel to the climate science domain and can assist in the analysis process. Computer scientists have developed spatial and spatio-temporal analysis techniques for a number of years now, and many of them may be applicable and/or adaptable to problems in climate science. We describe a large-scale, NSF-funded project aimed at addressing climate science question using computational analysis methods; team members include computer scientists, statisticians, and climate scientists from various backgrounds. One of the major thrusts is in the development of graph-based methods, and several illustrative examples of recent work in this area will be presented.
Analytical Cost Metrics : Days of Future Past
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prajapati, Nirmal; Rajopadhye, Sanjay; Djidjev, Hristo Nikolov
As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore’s law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore the major challenge that we face in computing systems researchmore » is: “how to solve massive-scale computational problems in the most time/power/energy efficient manner?”« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.; Catalyurek, Umit V.; Chevalier, Cedric
2015-01-16
This final progress report summarizes the work accomplished at the Combinatorial Scientific Computing and Petascale Simulations Institute. We developed Zoltan, a parallel mesh partitioning library that made use of accurate hypergraph models to provide load balancing in mesh-based computations. We developed several graph coloring algorithms for computing Jacobian and Hessian matrices and organized them into a software package called ColPack. We developed parallel algorithms for graph coloring and graph matching problems, and also designed multi-scale graph algorithms. Three PhD students graduated, six more are continuing their PhD studies, and four postdoctoral scholars were advised. Six of these students and Fellowsmore » have joined DOE Labs (Sandia, Berkeley), as staff scientists or as postdoctoral scientists. We also organized the SIAM Workshop on Combinatorial Scientific Computing (CSC) in 2007, 2009, and 2011 to continue to foster the CSC community.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Mid-year report FY17 Q2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Pugmire, David; Rogers, David
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Pugmire, David; Rogers, David
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem. Mid-year report FY16 Q2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY15 Q4.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
CGAT: a model for immersive personalized training in computational genomics
Sims, David; Ponting, Chris P.
2016-01-01
How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics. PMID:25981124
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geveci, Berk; Maynard, Robert
The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. The XVis project brought together collaborators from predominant DOE projects for visualization on accelerators and combining their respectivemore » features into a new visualization toolkit called VTK-m.« less
ArrayBridge: Interweaving declarative array processing with high-performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros
Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less
Network-based approaches to climate knowledge discovery
NASA Astrophysics Data System (ADS)
Budich, Reinhard; Nyberg, Per; Weigel, Tobias
2011-11-01
Climate Knowledge Discovery Workshop; Hamburg, Germany, 30 March to 1 April 2011 Do complex networks combined with semantic Web technologies offer the next generation of solutions in climate science? To address this question, a first Climate Knowledge Discovery (CKD) Workshop, hosted by the German Climate Computing Center (Deutsches Klimarechenzentrum (DKRZ)), brought together climate and computer scientists from major American and European laboratories, data centers, and universities, as well as representatives from industry, the broader academic community, and the semantic Web communities. The participants, representing six countries, were concerned with large-scale Earth system modeling and computational data analysis. The motivation for the meeting was the growing problem that climate scientists generate data faster than it can be interpreted and the need to prepare for further exponential data increases. Current analysis approaches are focused primarily on traditional methods, which are best suited for large-scale phenomena and coarse-resolution data sets. The workshop focused on the open discussion of ideas and technologies to provide the next generation of solutions to cope with the increasing data volumes in climate science.
Mantle Convection on Modern Supercomputers
NASA Astrophysics Data System (ADS)
Weismüller, J.; Gmeiner, B.; Huber, M.; John, L.; Mohr, M.; Rüde, U.; Wohlmuth, B.; Bunge, H. P.
2015-12-01
Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures is handled successfully only in an interdisciplinary context. A new priority program - named SPPEXA - by the German Research Foundation (DFG) addresses this issue, and brings together computer scientists, mathematicians and application scientists around grand challenges in HPC. Here we report from the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection and assess the impact of small scale processes on global mantle flow.
CGAT: a model for immersive personalized training in computational genomics.
Sims, David; Ponting, Chris P; Heger, Andreas
2016-01-01
How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics. © The Author 2015. Published by Oxford University Press.
Scaling up to address data science challenges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wendelberger, Joanne R.
Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Scaling up to address data science challenges
Wendelberger, Joanne R.
2017-04-27
Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Tessera: Open source software for accelerated data science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sego, Landon H.; Hafen, Ryan P.; Director, Hannah M.
2014-06-30
Extracting useful, actionable information from data can be a formidable challenge for the safeguards, nonproliferation, and arms control verification communities. Data scientists are often on the “front-lines” of making sense of complex and large datasets. They require flexible tools that make it easy to rapidly reformat large datasets, interactively explore and visualize data, develop statistical algorithms, and validate their approaches—and they need to perform these activities with minimal lines of code. Existing commercial software solutions often lack extensibility and the flexibility required to address the nuances of the demanding and dynamic environments where data scientists work. To address this need,more » Pacific Northwest National Laboratory developed Tessera, an open source software suite designed to enable data scientists to interactively perform their craft at the terabyte scale. Tessera automatically manages the complicated tasks of distributed storage and computation, empowering data scientists to do what they do best: tackling critical research and mission objectives by deriving insight from data. We illustrate the use of Tessera with an example analysis of computer network data.« less
Why build a virtual brain? Large-scale neural simulations as jump start for cognitive computing
NASA Astrophysics Data System (ADS)
Colombo, Matteo
2017-03-01
Despite the impressive amount of financial resources recently invested in carrying out large-scale brain simulations, it is controversial what the pay-offs are of pursuing this project. One idea is that from designing, building, and running a large-scale neural simulation, scientists acquire knowledge about the computational performance of the simulating system, rather than about the neurobiological system represented in the simulation. It has been claimed that this knowledge may usher in a new era of neuromorphic, cognitive computing systems. This study elucidates this claim and argues that the main challenge this era is facing is not the lack of biological realism. The challenge lies in identifying general neurocomputational principles for the design of artificial systems, which could display the robust flexibility characteristic of biological intelligence.
The future of scientific workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Peterka, Tom; Altintas, Ilkay
Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on thosemore » workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.« less
Implicit Theories of Creativity in Computer Science in the United States and China
ERIC Educational Resources Information Center
Tang, Chaoying; Baer, John; Kaufman, James C.
2015-01-01
To study implicit concepts of creativity in computer science in the United States and mainland China, we first asked 308 Chinese computer scientists for adjectives that would describe a creative computer scientist. Computer scientists and non-computer scientists from China (N = 1069) and the United States (N = 971) then rated how well those…
Scientific Services on the Cloud
NASA Astrophysics Data System (ADS)
Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong
Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Performance Analysis, Modeling and Scaling of HPC Applications and Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav
2016-01-13
E cient use of supercomputers at DOE centers is vital for maximizing system throughput, mini- mizing energy costs and enabling science breakthroughs faster. This requires complementary e orts along several directions to optimize the performance of scienti c simulation codes and the under- lying runtimes and software stacks. This in turn requires providing scalable performance analysis tools and modeling techniques that can provide feedback to physicists and computer scientists developing the simulation codes and runtimes respectively. The PAMS project is using time allocations on supercomputers at ALCF, NERSC and OLCF to further the goals described above by performing research alongmore » the following fronts: 1. Scaling Study of HPC applications; 2. Evaluation of Programming Models; 3. Hardening of Performance Tools; 4. Performance Modeling of Irregular Codes; and 5. Statistical Analysis of Historical Performance Data. We are a team of computer and computational scientists funded by both DOE/NNSA and DOE/ ASCR programs such as ECRP, XStack (Traleika Glacier, PIPER), ExaOSR (ARGO), SDMAV II (MONA) and PSAAP II (XPACC). This allocation will enable us to study big data issues when analyzing performance on leadership computing class systems and to assist the HPC community in making the most e ective use of these resources.« less
Exascale computing and what it means for shock physics
NASA Astrophysics Data System (ADS)
Germann, Timothy
2015-06-01
The U.S. Department of Energy is preparing to launch an Exascale Computing Initiative, to address the myriad challenges required to deploy and effectively utilize an exascale-class supercomputer (i.e., one capable of performing 1018 operations per second) in the 2023 timeframe. Since physical (power dissipation) requirements limit clock rates to at most a few GHz, this will necessitate the coordination of on the order of a billion concurrent operations, requiring sophisticated system and application software, and underlying mathematical algorithms, that may differ radically from traditional approaches. Even at the smaller workstation or cluster level of computation, the massive concurrency and heterogeneity within each processor will impact computational scientists. Through the multi-institutional, multi-disciplinary Exascale Co-design Center for Materials in Extreme Environments (ExMatEx), we have initiated an early and deep collaboration between domain (computational materials) scientists, applied mathematicians, computer scientists, and hardware architects, in order to establish the relationships between algorithms, software stacks, and architectures needed to enable exascale-ready materials science application codes within the next decade. In my talk, I will discuss these challenges, and what it will mean for exascale-era electronic structure, molecular dynamics, and engineering-scale simulations of shock-compressed condensed matter. In particular, we anticipate that the emerging hierarchical, heterogeneous architectures can be exploited to achieve higher physical fidelity simulations using adaptive physics refinement. This work is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research.
Computational nuclear quantum many-body problem: The UNEDF project
NASA Astrophysics Data System (ADS)
Bogner, S.; Bulgac, A.; Carlson, J.; Engel, J.; Fann, G.; Furnstahl, R. J.; Gandolfi, S.; Hagen, G.; Horoi, M.; Johnson, C.; Kortelainen, M.; Lusk, E.; Maris, P.; Nam, H.; Navratil, P.; Nazarewicz, W.; Ng, E.; Nobre, G. P. A.; Ormand, E.; Papenbrock, T.; Pei, J.; Pieper, S. C.; Quaglioni, S.; Roche, K. J.; Sarich, J.; Schunck, N.; Sosonkina, M.; Terasaki, J.; Thompson, I.; Vary, J. P.; Wild, S. M.
2013-10-01
The UNEDF project was a large-scale collaborative effort that applied high-performance computing to the nuclear quantum many-body problem. The primary focus of the project was on constructing, validating, and applying an optimized nuclear energy density functional, which entailed a wide range of pioneering developments in microscopic nuclear structure and reactions, algorithms, high-performance computing, and uncertainty quantification. UNEDF demonstrated that close associations among nuclear physicists, mathematicians, and computer scientists can lead to novel physics outcomes built on algorithmic innovations and computational developments. This review showcases a wide range of UNEDF science results to illustrate this interplay.
The Australian Computational Earth Systems Simulator
NASA Astrophysics Data System (ADS)
Mora, P.; Muhlhaus, H.; Lister, G.; Dyskin, A.; Place, D.; Appelbe, B.; Nimmervoll, N.; Abramson, D.
2001-12-01
Numerical simulation of the physics and dynamics of the entire earth system offers an outstanding opportunity for advancing earth system science and technology but represents a major challenge due to the range of scales and physical processes involved, as well as the magnitude of the software engineering effort required. However, new simulation and computer technologies are bringing this objective within reach. Under a special competitive national funding scheme to establish new Major National Research Facilities (MNRF), the Australian government together with a consortium of Universities and research institutions have funded construction of the Australian Computational Earth Systems Simulator (ACcESS). The Simulator or computational virtual earth will provide the research infrastructure to the Australian earth systems science community required for simulations of dynamical earth processes at scales ranging from microscopic to global. It will consist of thematic supercomputer infrastructure and an earth systems simulation software system. The Simulator models and software will be constructed over a five year period by a multi-disciplinary team of computational scientists, mathematicians, earth scientists, civil engineers and software engineers. The construction team will integrate numerical simulation models (3D discrete elements/lattice solid model, particle-in-cell large deformation finite-element method, stress reconstruction models, multi-scale continuum models etc) with geophysical, geological and tectonic models, through advanced software engineering and visualization technologies. When fully constructed, the Simulator aims to provide the software and hardware infrastructure needed to model solid earth phenomena including global scale dynamics and mineralisation processes, crustal scale processes including plate tectonics, mountain building, interacting fault system dynamics, and micro-scale processes that control the geological, physical and dynamic behaviour of earth systems. ACcESS represents a part of Australia's contribution to the APEC Cooperation for Earthquake Simulation (ACES) international initiative. Together with other national earth systems science initiatives including the Japanese Earth Simulator and US General Earthquake Model projects, ACcESS aims to provide a driver for scientific advancement and technological breakthroughs including: quantum leaps in understanding of earth evolution at global, crustal, regional and microscopic scales; new knowledge of the physics of crustal fault systems required to underpin the grand challenge of earthquake prediction; new understanding and predictive capabilities of geological processes such as tectonics and mineralisation.
Long live the Data Scientist, but can he/she persist?
NASA Astrophysics Data System (ADS)
Wyborn, L. A.
2011-12-01
In recent years the fourth paradigm of data intensive science has slowly taken hold as the increased capacity of instruments and an increasing number of instruments (in particular sensor networks) have changed how fundamental research is undertaken. Most modern scientific research is about digital capture of data direct from instruments, processing it by computers, storing the results on computers and only publishing a small fraction of data in hard copy publications. At the same time, the rapid increase in capacity of supercomputers, particularly at petascale, means that far larger data sets can be analysed and to greater resolution than previously possible. The new cloud computing paradigm which allows distributed data, software and compute resources to be linked by seamless workflows, is creating new opportunities in processing of high volumes of data to an increasingly larger number of researchers. However, to take full advantage of these compute resources, data sets for analysis have to be aggregated from multiple sources to create high performance data sets. These new technology developments require that scientists must become more skilled in data management and/or have a higher degree of computer literacy. In almost every science discipline there is now an X-informatics branch and a computational X branch (eg, Geoinformatics and Computational Geoscience): both require a new breed of researcher that has skills in both the science fundamentals and also knowledge of some ICT aspects (computer programming, data base design and development, data curation, software engineering). People that can operate in both science and ICT are increasingly known as 'data scientists'. Data scientists are a critical element of many large scale earth and space science informatics projects, particularly those that are tackling current grand challenges at an international level on issues such as climate change, hazard prediction and sustainable development of our natural resources. These projects by their very nature require the integration of multiple digital data sets from multiple sources. Often the preparation of the data for computational analysis can take months and requires painstaking attention to detail to ensure that anomalies identified are real and are not just artefacts of the data preparation and/or the computational analysis. Although data scientists are increasingly vital to successful data intensive earth and space science projects, unless they are recognised for their capabilities in both the science and the computational domains they are likely to migrate to either a science role or an ICT role as their career advances. Most reward and recognition systems do not recognise those with skills in both, hence, getting trained data scientists to persist beyond one or two projects can be challenge. Those data scientists that persist in the profession are characteristically committed and enthusiastic people who have the support of their organisations to take on this role. They also tend to be people who share developments and are critical to the success of the open source software movement. However, the fact remains that survival of the data scientist as a species is being threatened unless something is done to recognise their invaluable contributions to the new fourth paradigm of science.
The Cell Collective: Toward an open and collaborative approach to systems biology
2012-01-01
Background Despite decades of new discoveries in biomedical research, the overwhelming complexity of cells has been a significant barrier to a fundamental understanding of how cells work as a whole. As such, the holistic study of biochemical pathways requires computer modeling. Due to the complexity of cells, it is not feasible for one person or group to model the cell in its entirety. Results The Cell Collective is a platform that allows the world-wide scientific community to create these models collectively. Its interface enables users to build and use models without specifying any mathematical equations or computer code - addressing one of the major hurdles with computational research. In addition, this platform allows scientists to simulate and analyze the models in real-time on the web, including the ability to simulate loss/gain of function and test what-if scenarios in real time. Conclusions The Cell Collective is a web-based platform that enables laboratory scientists from across the globe to collaboratively build large-scale models of various biological processes, and simulate/analyze them in real time. In this manuscript, we show examples of its application to a large-scale model of signal transduction. PMID:22871178
1001 Ways to run AutoDock Vina for virtual screening
NASA Astrophysics Data System (ADS)
Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.
2016-03-01
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
1001 Ways to run AutoDock Vina for virtual screening.
Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D
2016-03-01
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
Chang, Justin; Karra, Satish; Nakshatrala, Kalyana B.
2016-07-26
It is well-known that the standard Galerkin formulation, which is often the formulation of choice under the finite element method for solving self-adjoint diffusion equations, does not meet maximum principles and the non-negative constraint for anisotropic diffusion equations. Recently, optimization-based methodologies that satisfy maximum principles and the non-negative constraint for steady-state and transient diffusion-type equations have been proposed. To date, these methodologies have been tested only on small-scale academic problems. The purpose of this paper is to systematically study the performance of the non-negative methodology in the context of high performance computing (HPC). PETSc and TAO libraries are, respectively, usedmore » for the parallel environment and optimization solvers. For large-scale problems, it is important for computational scientists to understand the computational performance of current algorithms available in these scientific libraries. The numerical experiments are conducted on the state-of-the-art HPC systems, and a single-core performance model is used to better characterize the efficiency of the solvers. Furthermore, our studies indicate that the proposed non-negative computational framework for diffusion-type equations exhibits excellent strong scaling for real-world large-scale problems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallarno, George; Rogers, James H; Maxwell, Don E
The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. The world s second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that computational scientists use to perform scientific simu- lations and data analysis. Understanding of GPU reliability characteristics, however, is still in its nascent stage since GPUs have only recently been deployed at large-scale. This paper presents a detailed study of GPU errors and their impact on system operations and applications, describing experiences with the 18,688 GPUs on the Titan supercom- puter as well as lessons learnedmore » in the process of efficient operation of GPUs at scale. These experiences are helpful to HPC sites which already have large-scale GPU clusters or plan to deploy GPUs in the future.« less
To the Cloud! A Grassroots Proposal to Accelerate Brain Science Discovery
Vogelstein, Joshua T.; Mensh, Brett; Hausser, Michael; Spruston, Nelson; Evans, Alan; Kording, Konrad; Amunts, Katrin; Ebell, Christoph; Muller, Jeff; Telefont, Martin; Hill, Sean; Koushika, Sandhya P.; Cali, Corrado; Valdés-Sosa, Pedro Antonio; Littlewood, Peter; Koch, Christof; Saalfeld, Stephan; Kepecs, Adam; Peng, Hanchuan; Halchenko, Yaroslav O.; Kiar, Gregory; Poo, Mu-Ming; Poline, Jean-Baptiste; Milham, Michael P.; Schaffer, Alyssa Picchini; Gidron, Rafi; Okano, Hideyuki; Calhoun, Vince D; Chun, Miyoung; Kleissas, Dean M.; Vogelstein, R. Jacob; Perlman, Eric; Burns, Randal; Huganir, Richard; Miller, Michael I.
2018-01-01
The revolution in neuroscientific data acquisition is creating an analysis challenge. We propose leveraging cloud-computing technologies to enable large-scale neurodata storing, exploring, analyzing, and modeling. This utility will empower scientists globally to generate and test theories of brain function and dysfunction. PMID:27810005
NASA Astrophysics Data System (ADS)
Rodriguez, Sarah L.; Lehman, Kathleen
2017-10-01
This theoretical paper explores the need for enhanced, intersectional computing identity theory for the purpose of developing a diverse group of computer scientists for the future. Greater theoretical understanding of the identity formation process specifically for computing is needed in order to understand how students come to understand themselves as computer scientists. To ensure that the next generation of computer scientists is diverse, this paper presents a case for examining identity development intersectionally, understanding the ways in which women and underrepresented students may have difficulty identifying as computer scientists and be systematically oppressed in their pursuit of computer science careers. Through a review of the available scholarship, this paper suggests that creating greater theoretical understanding of the computing identity development process will inform the way in which educational stakeholders consider computer science practices and policies.
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations
NASA Astrophysics Data System (ADS)
Demir, I.; Agliamzanov, R.
2014-12-01
Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hazi, A U
2007-02-06
Setting performance goals is part of the business plan for almost every company. The same is true in the world of supercomputers. Ten years ago, the Department of Energy (DOE) launched the Accelerated Strategic Computing Initiative (ASCI) to help ensure the safety and reliability of the nation's nuclear weapons stockpile without nuclear testing. ASCI, which is now called the Advanced Simulation and Computing (ASC) Program and is managed by DOE's National Nuclear Security Administration (NNSA), set an initial 10-year goal to obtain computers that could process up to 100 trillion floating-point operations per second (teraflops). Many computer experts thought themore » goal was overly ambitious, but the program's results have proved them wrong. Last November, a Livermore-IBM team received the 2005 Gordon Bell Prize for achieving more than 100 teraflops while modeling the pressure-induced solidification of molten metal. The prestigious prize, which is named for a founding father of supercomputing, is awarded each year at the Supercomputing Conference to innovators who advance high-performance computing. Recipients for the 2005 prize included six Livermore scientists--physicists Fred Streitz, James Glosli, and Mehul Patel and computer scientists Bor Chan, Robert Yates, and Bronis de Supinski--as well as IBM researchers James Sexton and John Gunnels. This team produced the first atomic-scale model of metal solidification from the liquid phase with results that were independent of system size. The record-setting calculation used Livermore's domain decomposition molecular-dynamics (ddcMD) code running on BlueGene/L, a supercomputer developed by IBM in partnership with the ASC Program. BlueGene/L reached 280.6 teraflops on the Linpack benchmark, the industry standard used to measure computing speed. As a result, it ranks first on the list of Top500 Supercomputer Sites released in November 2005. To evaluate the performance of nuclear weapons systems, scientists must understand how materials behave under extreme conditions. Because experiments at high pressures and temperatures are often difficult or impossible to conduct, scientists rely on computer models that have been validated with obtainable data. Of particular interest to weapons scientists is the solidification of metals. ''To predict the performance of aging nuclear weapons, we need detailed information on a material's phase transitions'', says Streitz, who leads the Livermore-IBM team. For example, scientists want to know what happens to a metal as it changes from molten liquid to a solid and how that transition affects the material's characteristics, such as its strength.« less
A Grid Infrastructure for Supporting Space-based Science Operations
NASA Technical Reports Server (NTRS)
Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)
2002-01-01
Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Livny, Miron; Shank, James; Ernst, Michael
Under this SciDAC-2 grant the project’s goal w a s t o stimulate new discoveries by providing scientists with effective and dependable access to an unprecedented national distributed computational facility: the Open Science Grid (OSG). We proposed to achieve this through the work of the Open Science Grid Consortium: a unique hands-on multi-disciplinary collaboration of scientists, software developers and providers of computing resources. Together the stakeholders in this consortium sustain and use a shared distributed computing environment that transforms simulation and experimental science in the US. The OSG consortium is an open collaboration that actively engages new research communities. Wemore » operate an open facility that brings together a broad spectrum of compute, storage, and networking resources and interfaces to other cyberinfrastructures, including the US XSEDE (previously TeraGrid), the European Grids for ESciencE (EGEE), as well as campus and regional grids. We leverage middleware provided by computer science groups, facility IT support organizations, and computing programs of application communities for the benefit of consortium members and the US national CI.« less
Crossing disciplines and scales to understand the critical zone
Brantley, S.L.; Goldhaber, M.B.; Vala, Ragnarsdottir K.
2007-01-01
The Critical Zone (CZ) is the system of coupled chemical, biological, physical, and geological processes operating together to support life at the Earth's surface. While our understanding of this zone has increased over the last hundred years, further advance requires scientists to cross disciplines and scales to integrate understanding of processes in the CZ, ranging in scale from the mineral-water interface to the globe. Despite the extreme heterogeneities manifest in the CZ, patterns are observed at all scales. Explanations require the use of new computational and analytical tools, inventive interdisciplinary approaches, and growing networks of sites and people.
Harvey, Benjamin Simeon; Ji, Soo-Yeon
2017-01-01
As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of large-scale cancer genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale distributed parallel (CSDP) separable 1-D wavelet decomposition technique for denoising through differential expression thresholding and classification of LSCG microarray data. This research presents a novel methodology that utilizes a CSDP separable 1-D method for wavelet-based transformation in order to initialize a threshold which will retain significantly expressed genes through the denoising process for robust classification of cancer patients. Additionally, the overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding for denoising was used for the classification of samples within the Global Cancer Map, Cancer Cell Line Encyclopedia, and The Cancer Genome Atlas. The results proved that separable 1-D parallel distributed wavelet denoising in the cloud and differential expression thresholding increased the computational performance and enabled the generation of higher quality LSCG microarray datasets, which led to more accurate classification results.
NASA Astrophysics Data System (ADS)
Moore, R. T.; Hansen, M. C.
2011-12-01
Google Earth Engine is a new technology platform that enables monitoring and measurement of changes in the earth's environment, at planetary scale, on a large catalog of earth observation data. The platform offers intrinsically-parallel computational access to thousands of computers in Google's data centers. Initial efforts have focused primarily on global forest monitoring and measurement, in support of REDD+ activities in the developing world. The intent is to put this platform into the hands of scientists and developing world nations, in order to advance the broader operational deployment of existing scientific methods, and strengthen the ability for public institutions and civil society to better understand, manage and report on the state of their natural resources. Earth Engine currently hosts online nearly the complete historical Landsat archive of L5 and L7 data collected over more than twenty-five years. Newly-collected Landsat imagery is downloaded from USGS EROS Center into Earth Engine on a daily basis. Earth Engine also includes a set of historical and current MODIS data products. The platform supports generation, on-demand, of spatial and temporal mosaics, "best-pixel" composites (for example to remove clouds and gaps in satellite imagery), as well as a variety of spectral indices. Supervised learning methods are available over the Landsat data catalog. The platform also includes a new application programming framework, or "API", that allows scientists access to these computational and data resources, to scale their current algorithms or develop new ones. Under the covers of the Google Earth Engine API is an intrinsically-parallel image-processing system. Several forest monitoring applications powered by this API are currently in development and expected to be operational in 2011. Combining science with massive data and technology resources in a cloud-computing framework can offer advantages of computational speed, ease-of-use and collaboration, as well as transparency in data and methods. Methods developed for global processing of MODIS data to map land cover are being adopted for use with Landsat data. Specifically, the MODIS Vegetation Continuous Field product methodology has been applied for mapping forest extent and change at national scales using Landsat time-series data sets. Scaling this method to continental and global scales is enabled by Google Earth Engine computing capabilities. By combining the supervised learning VCF approach with the Landsat archive and cloud computing, unprecedented monitoring of land cover dynamics is enabled.
Large-Scale Distributed Computational Fluid Dynamics on the Information Power Grid Using Globus
NASA Technical Reports Server (NTRS)
Barnard, Stephen; Biswas, Rupak; Saini, Subhash; VanderWijngaart, Robertus; Yarrow, Maurice; Zechtzer, Lou; Foster, Ian; Larsson, Olle
1999-01-01
This paper describes an experiment in which a large-scale scientific application development for tightly-coupled parallel machines is adapted to the distributed execution environment of the Information Power Grid (IPG). A brief overview of the IPG and a description of the computational fluid dynamics (CFD) algorithm are given. The Globus metacomputing toolkit is used as the enabling device for the geographically-distributed computation. Modifications related to latency hiding and Load balancing were required for an efficient implementation of the CFD application in the IPG environment. Performance results on a pair of SGI Origin 2000 machines indicate that real scientific applications can be effectively implemented on the IPG; however, a significant amount of continued effort is required to make such an environment useful and accessible to scientists and engineers.
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Recruitment of Foreigners in the Market for Computer Scientists in the United States
Bound, John; Braga, Breno; Golden, Joseph M.
2016-01-01
We present and calibrate a dynamic model that characterizes the labor market for computer scientists. In our model, firms can recruit computer scientists from recently graduated college students, from STEM workers working in other occupations or from a pool of foreign talent. Counterfactual simulations suggest that wages for computer scientists would have been 2.8–3.8% higher, and the number of Americans employed as computers scientists would have been 7.0–13.6% higher in 2004 if firms could not hire more foreigners than they could in 1994. In contrast, total CS employment would have been 3.8–9.0% lower, and consequently output smaller. PMID:27170827
Template Interfaces for Agile Parallel Data-Intensive Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramakrishnan, Lavanya; Gunter, Daniel; Pastorello, Gilerto Z.
Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres ismore » addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable "templates" that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.« less
NASA Astrophysics Data System (ADS)
Strayer, Michael
2007-09-01
Good morning. Welcome to Boston, the home of the Red Sox, Celtics and Bruins, baked beans, tea parties, Robert Parker, and SciDAC 2007. A year ago I stood before you to share the legacy of the first SciDAC program and identify the challenges that we must address on the road to petascale computing—a road E E Cummins described as `. . . never traveled, gladly beyond any experience.' Today, I want to explore the preparations for the rapidly approaching extreme scale (X-scale) generation. These preparations are the first step propelling us along the road of burgeoning scientific discovery enabled by the application of X- scale computing. We look to petascale computing and beyond to open up a world of discovery that cuts across scientific fields and leads us to a greater understanding of not only our world, but our universe. As part of the President's America Competitiveness Initiative, the ASCR Office has been preparing a ten year vision for computing. As part of this planning the LBNL together with ORNL and ANL hosted three town hall meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3). The proposed E3 initiative is organized around four programmatic themes: Engaging our top scientists, engineers, computer scientists and applied mathematicians; investing in pioneering large-scale science; developing scalable analysis algorithms, and storage architectures to accelerate discovery; and accelerating the build-out and future development of the DOE open computing facilities. It is clear that we have only just started down the path to extreme scale computing. Plan to attend Thursday's session on the out-briefing and discussion of these meetings. The road to the petascale has been at best rocky. In FY07, the continuing resolution provided 12% less money for Advanced Scientific Computing than either the President, the Senate, or the House. As a consequence, many of you had to absorb a no cost extension for your SciDAC work. I am pleased that the President's FY08 budget restores the funding for SciDAC. Quoting from Advanced Scientific Computing Research description in the House Energy and Water Development Appropriations Bill for FY08, "Perhaps no other area of research at the Department is so critical to sustaining U.S. leadership in science and technology, revolutionizing the way science is done and improving research productivity." As a society we need to revolutionize our approaches to energy, environmental and global security challenges. As we go forward along the road to the X-scale generation, the use of computation will continue to be a critical tool along with theory and experiment in understanding the behavior of the fundamental components of nature as well as for fundamental discovery and exploration of the behavior of complex systems. The foundation to overcome these societal challenges will build from the experiences and knowledge gained as you, members of our SciDAC research teams, work together to attack problems at the tera- and peta- scale. If SciDAC is viewed as an experiment for revolutionizing scientific methodology, then a strategic goal of ASCR program must be to broaden the intellectual base prepared to address the challenges of the new X-scale generation of computing. We must focus our computational science experiences gained over the past five years on the opportunities introduced with extreme scale computing. Our facilities are on a path to provide the resources needed to undertake the first part of our journey. Using the newly upgraded 119 teraflop Cray XT system at the Leadership Computing Facility, SciDAC research teams have in three days performed a 100-year study of the time evolution of the atmospheric CO2 concentration originating from the land surface. The simulation of the El Nino/Southern Oscillation which was part of this study has been characterized as `the most impressive new result in ten years' gained new insight into the behavior of superheated ionic gas in the ITER reactor as a result of an AORSA run on 22,500 processors that achieved over 87 trillion calculations per second (87 teraflops) which is 74% of the system's theoretical peak. Tomorrow, Argonne and IBM will announce that the first IBM Blue Gene/P, a 100 teraflop system, will be shipped to the Argonne Leadership Computing Facility later this fiscal year. By the end of FY2007 ASCR high performance and leadership computing resources will include the 114 teraflop IBM Blue Gene/P; a 102 teraflop Cray XT4 at NERSC and a 119 teraflop Cray XT system at Oak Ridge. Before ringing in the New Year, Oak Ridge will upgrade to 250 teraflops with the replacement of the dual core processors with quad core processors and Argonne will upgrade to between 250-500 teraflops, and next year, a petascale Cray Baker system is scheduled for delivery at Oak Ridge. The multidisciplinary teams in our SciDAC Centers for Enabling Technologies and our SciDAC Institutes must continue to work with our Scientific Application teams to overcome the barriers that prevent effective use of these new systems. These challenges include: the need for new algorithms as well as operating system and runtime software and tools which scale to parallel systems composed of hundreds of thousands processors; program development environments and tools which scale effectively and provide ease of use for developers and scientific end users; and visualization and data management systems that support moving, storing, analyzing, manipulating and visualizing multi-petabytes of scientific data and objects. The SciDAC Centers, located primarily at our DOE national laboratories will take the lead in ensuring that critical computer science and applied mathematics issues are addressed in a timely and comprehensive fashion and to address issues associated with research software lifecycle. In contrast, the SciDAC Institutes, which are university-led centers of excellence, will have more flexibility to pursue new research topics through a range of research collaborations. The Institutes will also work to broaden the intellectual and researcher base—conducting short courses and summer schools to take advantage of new high performance computing capabilities. The SciDAC Outreach Center at Lawrence Berkeley National Laboratory complements the outreach efforts of the SciDAC Institutes. The Outreach Center is our clearinghouse for SciDAC activities and resources and will communicate with the high performance computing community in part to understand their needs for workshops, summer schools and institutes. SciDAC is not ASCR's only effort to broaden the computational science community needed to meet the challenges of the new X-scale generation. I hope that you were able to attend the Computational Science Graduate Fellowship poster session last night. ASCR developed the fellowship in 1991 to meet the nation's growing need for scientists and technology professionals with advanced computer skills. CSGF, now jointly funded between ASCR and NNSA, is more than a traditional academic fellowship. It has provided more than 200 of the best and brightest graduate students with guidance, support and community in preparing them as computational scientists. Today CSGF alumni are bringing their diverse top-level skills and knowledge to research teams at DOE laboratories and in industries such as Proctor and Gamble, Lockheed Martin and Intel. At universities they are working to train the next generation of computational scientists. To build on this success, we intend to develop a wholly new Early Career Principal Investigator's (ECPI) program. Our objective is to stimulate academic research in scientific areas within ASCR's purview especially among faculty in early stages of their academic careers. Last February, we lost Ken Kennedy, one of the leading lights of our community. As we move forward into the extreme computing generation, his vision and insight will be greatly missed. In memorial to Ken Kennedy, we shall designate the ECPI grants to beginning faculty in Computer Science as the Ken Kennedy Fellowship. Watch the ASCR website for more information about ECPI and other early career programs in the computational sciences. We look to you, our scientists, researchers, and visionaries to take X-scale computing and use it to explode scientific discovery in your fields. We at SciDAC will work to ensure that this tool is the sharpest and most precise and efficient instrument to carve away the unknown and reveal the most exciting secrets and stimulating scientific discoveries of our time. The partnership between research and computing is the marriage that will spur greater discovery, and as Spencer said to Susan in Robert Parker's novel, `Sudden Mischief', `We stick together long enough, and we may get as smart as hell'. Michael Strayer
Quantum Computing Architectural Design
NASA Astrophysics Data System (ADS)
West, Jacob; Simms, Geoffrey; Gyure, Mark
2006-03-01
Large scale quantum computers will invariably require scalable architectures in addition to high fidelity gate operations. Quantum computing architectural design (QCAD) addresses the problems of actually implementing fault-tolerant algorithms given physical and architectural constraints beyond those of basic gate-level fidelity. Here we introduce a unified framework for QCAD that enables the scientist to study the impact of varying error correction schemes, architectural parameters including layout and scheduling, and physical operations native to a given architecture. Our software package, aptly named QCAD, provides compilation, manipulation/transformation, multi-paradigm simulation, and visualization tools. We demonstrate various features of the QCAD software package through several examples.
Remote third shift EAST operation: a new paradigm
NASA Astrophysics Data System (ADS)
Schissel, D. P.; Coviello, E.; Eidietis, N.; Flanagan, S.; Garcia, F.; Humphreys, D.; Kostuk, M.; Lanctot, M.; Lee, X.; Margo, M.; Miller, D.; Parker, C.; Penaflor, B.; Qian, J. P.; Sun, X.; Tan, H.; Walker, M.; Xiao, B.; Yuan, Q.
2017-05-01
General Atomics’ (GA) scientists in the United States remotely conducted experimental operation of the experimental advanced superconducting tokamak (EAST) in China during its third shift. Scientists led these experiments in a dedicated remote control room that utilized a novel computer science hardware and software infrastructure to allow data movement, visualization, and communication on the time scale of EAST’s pulse cycle. This Fusion Science Collaboration Zone infrastructure allows the movement of large amounts of data between continents in a short time scale with a 300-fold increase in data transfer rate over that available using the traditional transmission protocol. Real-time data from control systems is moved almost instantaneously. An event system tied to the EAST pulse cycle allows automatic initiation of data transfers, resulting in bulk EAST data to be transferred to GA within minutes. The EAST data at GA is served via MDSplus to approved US collaborators avoiding multiple US clients from requesting data from EAST and competing for the long-haul network’s bandwidth. At present there are 37 approved scientists from 8 US research institutions.
Computer analysis of digital sky surveys using citizen science and manual classification
NASA Astrophysics Data System (ADS)
Kuminski, Evan; Shamir, Lior
2015-01-01
As current and future digital sky surveys such as SDSS, LSST, DES, Pan-STARRS and Gaia create increasingly massive databases containing millions of galaxies, there is a growing need to be able to efficiently analyze these data. An effective way to do this is through manual analysis, however, this may be insufficient considering the extremely vast pipelines of astronomical images generated by the present and future surveys. Some efforts have been made to use citizen science to classify galaxies by their morphology on a larger scale than individual or small groups of scientists can. While these citizen science efforts such as Zooniverse have helped obtain reasonably accurate morphological information about large numbers of galaxies, they cannot scale to provide complete analysis of billions of galaxy images that will be collected by future ventures such as LSST. Since current forms of manual classification cannot scale to the masses of data collected by digital sky surveys, it is clear that in order to keep up with the growing databases some form of automation of the data analysis will be required, and will work either independently or in combination with human analysis such as citizen science. Here we describe a computer vision method that can automatically analyze galaxy images and deduce galaxy morphology. Experiments using Galaxy Zoo 2 data show that the performance of the method increases as the degree of agreement between the citizen scientists gets higher, providing a cleaner dataset. For several morphological features, such as the spirality of the galaxy, the algorithm agreed with the citizen scientists on around 95% of the samples. However, the method failed to analyze some of the morphological features such as the number of spiral arms, and provided accuracy of just ~36%.
The International Symposium on Grids and Clouds
NASA Astrophysics Data System (ADS)
The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.
Use of Emerging Grid Computing Technologies for the Analysis of LIGO Data
NASA Astrophysics Data System (ADS)
Koranda, Scott
2004-03-01
The LIGO Scientific Collaboration (LSC) today faces the challenge of enabling analysis of terabytes of LIGO data by hundreds of scientists from institutions all around the world. To meet this challenge the LSC is developing tools, infrastructure, applications, and expertise leveraging Grid Computing technologies available today, and making available to LSC scientists compute resources at sites across the United States and Europe. We use digital credentials for strong and secure authentication and authorization to compute resources and data. Building on top of products from the Globus project for high-speed data transfer and information discovery we have created the Lightweight Data Replicator (LDR) to securely and robustly replicate data to resource sites. We have deployed at our computing sites the Virtual Data Toolkit (VDT) Server and Client packages, developed in collaboration with our partners in the GriPhyN and iVDGL projects, providing uniform access to distributed resources for users and their applications. Taken together these Grid Computing technologies and infrastructure have formed the LSC DataGrid--a coherent and uniform environment across two continents for the analysis of gravitational-wave detector data. Much work, however, remains in order to scale current analyses and recent lessons learned need to be integrated into the next generation of Grid middleware.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...
2017-01-28
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
The Convergence of High Performance Computing and Large Scale Data Analytics
NASA Astrophysics Data System (ADS)
Duffy, D.; Bowen, M. K.; Thompson, J. H.; Yang, C. P.; Hu, F.; Wills, B.
2015-12-01
As the combinations of remote sensing observations and model outputs have grown, scientists are increasingly burdened with both the necessity and complexity of large-scale data analysis. Scientists are increasingly applying traditional high performance computing (HPC) solutions to solve their "Big Data" problems. While this approach has the benefit of limiting data movement, the HPC system is not optimized to run analytics, which can create problems that permeate throughout the HPC environment. To solve these issues and to alleviate some of the strain on the HPC environment, the NASA Center for Climate Simulation (NCCS) has created the Advanced Data Analytics Platform (ADAPT), which combines both HPC and cloud technologies to create an agile system designed for analytics. Large, commonly used data sets are stored in this system in a write once/read many file system, such as Landsat, MODIS, MERRA, and NGA. High performance virtual machines are deployed and scaled according to the individual scientist's requirements specifically for data analysis. On the software side, the NCCS and GMU are working with emerging commercial technologies and applying them to structured, binary scientific data in order to expose the data in new ways. Native NetCDF data is being stored within a Hadoop Distributed File System (HDFS) enabling storage-proximal processing through MapReduce while continuing to provide accessibility of the data to traditional applications. Once the data is stored within HDFS, an additional indexing scheme is built on top of the data and placed into a relational database. This spatiotemporal index enables extremely fast mappings of queries to data locations to dramatically speed up analytics. These are some of the first steps toward a single unified platform that optimizes for both HPC and large-scale data analysis, and this presentation will elucidate the resulting and necessary exascale architectures required for future systems.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...
2015-07-14
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Improving Data Mobility & Management for International Cosmology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borrill, Julian; Dart, Eli; Gore, Brooklin
In February 2015 the third workshop in the CrossConnects series, with a focus on Improving Data Mobility & Management for International Cosmology, was held at Lawrence Berkeley National Laboratory. Scientists from fields including astrophysics, cosmology, and astronomy collaborated with experts in computing and networking to outline strategic opportunities for enhancing scientific productivity and effectively managing the ever-increasing scale of scientific data.
Scaling predictive modeling in drug development with cloud computing.
Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola
2015-01-26
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
NASA Astrophysics Data System (ADS)
Candela, S. G.; Howat, I.; Noh, M. J.; Porter, C. C.; Morin, P. J.
2016-12-01
In the last decade, high resolution satellite imagery has become an increasingly accessible tool for geoscientists to quantify changes in the Arctic land surface due to geophysical, ecological and anthropomorphic processes. However, the trade off between spatial coverage and spatial-temporal resolution has limited detailed, process-level change detection over large (i.e. continental) scales. The ArcticDEM project utilized over 300,000 Worldview image pairs to produce a nearly 100% coverage elevation model (above 60°N) offering the first polar, high spatial - high resolution (2-8m by region) dataset, often with multiple repeats in areas of particular interest to geo-scientists. A dataset of this size (nearly 250 TB) offers endless new avenues of scientific inquiry, but quickly becomes unmanageable computationally and logistically for the computing resources available to the average scientist. Here we present TopoDiff, a framework for a generalized. automated workflow that requires minimal input from the end user about a study site, and utilizes cloud computing resources to provide a temporally sorted and differenced dataset, ready for geostatistical analysis. This hands-off approach allows the end user to focus on the science, without having to manage thousands of files, or petabytes of data. At the same time, TopoDiff provides a consistent and accurate workflow for image sorting, selection, and co-registration enabling cross-comparisons between research projects.
2017 ISCB Accomplishment by a Senior Scientist Award: Pavel Pevzner
Fogg, Christiana N.; Kovats, Diane E.; Berger, Bonnie
2017-01-01
The International Society for Computational Biology ( ISCB) recognizes an established scientist each year with the Accomplishment by a Senior Scientist Award for significant contributions he or she has made to the field. This award honors scientists who have contributed to the advancement of computational biology and bioinformatics through their research, service, and education work. Pavel Pevzner, PhD, Ronald R. Taylor Professor of Computer Science and Director of the NIH Center for Computational Mass Spectrometry at University of California, San Diego, has been selected as the winner of the 2017 Accomplishment by a Senior Scientist Award. The ISCB awards committee, chaired by Dr. Bonnie Berger of the Massachusetts Institute of Technology, selected Pevzner as the 2017 winner. Pevzner will receive his award and deliver a keynote address at the 2017 Intelligent Systems for Molecular Biology-European Conference on Computational Biology joint meeting ( ISMB/ECCB 2017) held in Prague, Czech Republic from July 21-July 25, 2017. ISMB/ECCB is a biennial joint meeting that brings together leading scientists in computational biology and bioinformatics from around the globe. PMID:28713548
Midekisa, Alemayehu; Holl, Felix; Savory, David J; Andrade-Pacheco, Ricardo; Gething, Peter W; Bennett, Adam; Sturrock, Hugh J W
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.
Holl, Felix; Savory, David J.; Andrade-Pacheco, Ricardo; Gething, Peter W.; Bennett, Adam; Sturrock, Hugh J. W.
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth’s land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources. PMID:28953943
NASA Advanced Supercomputing Facility Expansion
NASA Technical Reports Server (NTRS)
Thigpen, William W.
2017-01-01
The NASA Advanced Supercomputing (NAS) Division enables advances in high-end computing technologies and in modeling and simulation methods to tackle some of the toughest science and engineering challenges facing NASA today. The name "NAS" has long been associated with leadership and innovation throughout the high-end computing (HEC) community. We play a significant role in shaping HEC standards and paradigms, and provide leadership in the areas of large-scale InfiniBand fabrics, Lustre open-source filesystems, and hyperwall technologies. We provide an integrated high-end computing environment to accelerate NASA missions and make revolutionary advances in science. Pleiades, a petaflop-scale supercomputer, is used by scientists throughout the U.S. to support NASA missions, and is ranked among the most powerful systems in the world. One of our key focus areas is in modeling and simulation to support NASA's real-world engineering applications and make fundamental advances in modeling and simulation methods.
Jade: using on-demand cloud analysis to give scientists back their flow
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.
2017-12-01
The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
NASA Astrophysics Data System (ADS)
Liben-Nowell, David
With the recent explosion of popularity of commercial social-networking sites like Facebook and MySpace, the size of social networks that can be studied scientifically has passed from the scale traditionally studied by sociologists and anthropologists to the scale of networks more typically studied by computer scientists. In this chapter, I will highlight a recent line of computational research into the modeling and analysis of the small-world phenomenon - the observation that typical pairs of people in a social network are connected by very short chains of intermediate friends - and the ability of members of a large social network to collectively find efficient routes to reach individuals in the network. I will survey several recent mathematical models of social networks that account for these phenomena, with an emphasis on both the provable properties of these social-network models and the empirical validation of the models against real large-scale social-network data.
Software Carpentry and the Hydrological Sciences
NASA Astrophysics Data System (ADS)
Ahmadia, A. J.; Kees, C. E.; Farthing, M. W.
2013-12-01
Scientists are spending an increasing amount of time building and using hydrology software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. As hydrology models increase in capability and enter use by a growing number of scientists and their communities, it is important that the scientific software development practices scale up to meet the challenges posed by increasing software complexity, lengthening software lifecycles, a growing number of stakeholders and contributers, and a broadened developer base that extends from application domains to high performance computing centers. Many of these challenges in complexity, lifecycles, and developer base have been successfully met by the open source community, and there are many lessons to be learned from their experiences and practices. Additionally, there is much wisdom to be found in the results of research studies conducted on software engineering itself. Software Carpentry aims to bridge the gap between the current state of software development and these known best practices for scientific software development, with a focus on hands-on exercises and practical advice based on the following principles: 1. Write programs for people, not computers. 2. Automate repetitive tasks 3. Use the computer to record history 4. Make incremental changes 5. Use version control 6. Don't repeat yourself (or others) 7. Plan for mistakes 8. Optimize software only after it works 9. Document design and purpose, not mechanics 10. Collaborate We discuss how these best practices, arising from solid foundations in research and experience, have been shown to help improve scientist's productivity and the reliability of their software.
Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges.
Stein, Lincoln D
2008-09-01
Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.
A comment on the use of flushing time, residence time, and age as transport time scales
Monsen, N.E.; Cloern, J.E.; Lucas, L.V.; Monismith, Stephen G.
2002-01-01
Applications of transport time scales are pervasive in biological, hydrologic, and geochemical studies yet these times scales are not consistently defined and applied with rigor in the literature. We compare three transport time scales (flushing time, age, and residence time) commonly used to measure the retention of water or scalar quantities transported with water. We identify the underlying assumptions associated with each time scale, describe procedures for computing these time scales in idealized cases, and identify pitfalls when real-world systems deviate from these idealizations. We then apply the time scale definitions to a shallow 378 ha tidal lake to illustrate how deviations between real water bodies and the idealized examples can result from: (1) non-steady flow; (2) spatial variability in bathymetry, circulation, and transport time scales; and (3) tides that introduce complexities not accounted for in the idealized cases. These examples illustrate that no single transport time scale is valid for all time periods, locations, and constituents, and no one time scale describes all transport processes. We encourage aquatic scientists to rigorously define the transport time scale when it is applied, identify the underlying assumptions in the application of that concept, and ask if those assumptions are valid in the application of that approach for computing transport time scales in real systems.
Information technology challenges of biodiversity and ecosystems informatics
Schnase, J.L.; Cushing, J.; Frame, M.; Frondorf, A.; Landis, E.; Maier, D.; Silberschatz, A.
2003-01-01
Computer scientists, biologists, and natural resource managers recently met to examine the prospects for advancing computer science and information technology research by focusing on the complex and often-unique challenges found in the biodiversity and ecosystem domain. The workshop and its final report reveal that the biodiversity and ecosystem sciences are fundamentally information sciences and often address problems having distinctive attributes of scale and socio-technical complexity. The paper provides an overview of the emerging field of biodiversity and ecosystem informatics and demonstrates how the demands of biodiversity and ecosystem research can advance our understanding and use of information technologies.
OnSight: Multi-platform Visualization of the Surface of Mars
NASA Astrophysics Data System (ADS)
Abercrombie, S. P.; Menzies, A.; Winter, A.; Clausen, M.; Duran, B.; Jorritsma, M.; Goddard, C.; Lidawer, A.
2017-12-01
A key challenge of planetary geology is to develop an understanding of an environment that humans cannot (yet) visit. Instead, scientists rely on visualizations created from images sent back by robotic explorers, such as the Curiosity Mars rover. OnSight is a multi-platform visualization tool that helps scientists and engineers to visualize the surface of Mars. Terrain visualization allows scientists to understand the scale and geometric relationships of the environment around the Curiosity rover, both for scientific understanding and for tactical consideration in safely operating the rover. OnSight includes a web-based 2D/3D visualization tool, as well as an immersive mixed reality visualization. In addition, OnSight offers a novel feature for communication among the science team. Using the multiuser feature of OnSight, scientists can meet virtually on Mars, to discuss geology in a shared spatial context. Combining web-based visualization with immersive visualization allows OnSight to leverage strengths of both platforms. This project demonstrates how 3D visualization can be adapted to either an immersive environment or a computer screen, and will discuss advantages and disadvantages of both platforms.
Interactive visualization of Earth and Space Science computations
NASA Technical Reports Server (NTRS)
Hibbard, William L.; Paul, Brian E.; Santek, David A.; Dyer, Charles R.; Battaiola, Andre L.; Voidrot-Martinez, Marie-Francoise
1994-01-01
Computers have become essential tools for scientists simulating and observing nature. Simulations are formulated as mathematical models but are implemented as computer algorithms to simulate complex events. Observations are also analyzed and understood in terms of mathematical models, but the number of these observations usually dictates that we automate analyses with computer algorithms. In spite of their essential role, computers are also barriers to scientific understanding. Unlike hand calculations, automated computations are invisible and, because of the enormous numbers of individual operations in automated computations, the relation between an algorithm's input and output is often not intuitive. This problem is illustrated by the behavior of meteorologists responsible for forecasting weather. Even in this age of computers, many meteorologists manually plot weather observations on maps, then draw isolines of temperature, pressure, and other fields by hand (special pads of maps are printed for just this purpose). Similarly, radiologists use computers to collect medical data but are notoriously reluctant to apply image-processing algorithms to that data. To these scientists with life-and-death responsibilities, computer algorithms are black boxes that increase rather than reduce risk. The barrier between scientists and their computations can be bridged by techniques that make the internal workings of algorithms visible and that allow scientists to experiment with their computations. Here we describe two interactive systems developed at the University of Wisconsin-Madison Space Science and Engineering Center (SSEC) that provide these capabilities to Earth and space scientists.
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin; ...
2015-02-19
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
RNA nanotechnology for computer design and in vivo computation
Qiu, Meikang; Khisamutdinov, Emil; Zhao, Zhengyi; Pan, Cheryl; Choi, Jeong-Woo; Leontis, Neocles B.; Guo, Peixuan
2013-01-01
Molecular-scale computing has been explored since 1989 owing to the foreseeable limitation of Moore's law for silicon-based computation devices. With the potential of massive parallelism, low energy consumption and capability of working in vivo, molecular-scale computing promises a new computational paradigm. Inspired by the concepts from the electronic computer, DNA computing has realized basic Boolean functions and has progressed into multi-layered circuits. Recently, RNA nanotechnology has emerged as an alternative approach. Owing to the newly discovered thermodynamic stability of a special RNA motif (Shu et al. 2011 Nat. Nanotechnol. 6, 658–667 (doi:10.1038/nnano.2011.105)), RNA nanoparticles are emerging as another promising medium for nanodevice and nanomedicine as well as molecular-scale computing. Like DNA, RNA sequences can be designed to form desired secondary structures in a straightforward manner, but RNA is structurally more versatile and more thermodynamically stable owing to its non-canonical base-pairing, tertiary interactions and base-stacking property. A 90-nucleotide RNA can exhibit 490 nanostructures, and its loops and tertiary architecture can serve as a mounting dovetail that eliminates the need for external linking dowels. Its enzymatic and fluorogenic activity creates diversity in computational design. Varieties of small RNA can work cooperatively, synergistically or antagonistically to carry out computational logic circuits. The riboswitch and enzymatic ribozyme activities and its special in vivo attributes offer a great potential for in vivo computation. Unique features in transcription, termination, self-assembly, self-processing and acid resistance enable in vivo production of RNA nanoparticles that harbour various regulators for intracellular manipulation. With all these advantages, RNA computation is promising, but it is still in its infancy. Many challenges still exist. Collaborations between RNA nanotechnologists and computer scientists are necessary to advance this nascent technology. PMID:24000362
RNA nanotechnology for computer design and in vivo computation.
Qiu, Meikang; Khisamutdinov, Emil; Zhao, Zhengyi; Pan, Cheryl; Choi, Jeong-Woo; Leontis, Neocles B; Guo, Peixuan
2013-10-13
Molecular-scale computing has been explored since 1989 owing to the foreseeable limitation of Moore's law for silicon-based computation devices. With the potential of massive parallelism, low energy consumption and capability of working in vivo, molecular-scale computing promises a new computational paradigm. Inspired by the concepts from the electronic computer, DNA computing has realized basic Boolean functions and has progressed into multi-layered circuits. Recently, RNA nanotechnology has emerged as an alternative approach. Owing to the newly discovered thermodynamic stability of a special RNA motif (Shu et al. 2011 Nat. Nanotechnol. 6, 658-667 (doi:10.1038/nnano.2011.105)), RNA nanoparticles are emerging as another promising medium for nanodevice and nanomedicine as well as molecular-scale computing. Like DNA, RNA sequences can be designed to form desired secondary structures in a straightforward manner, but RNA is structurally more versatile and more thermodynamically stable owing to its non-canonical base-pairing, tertiary interactions and base-stacking property. A 90-nucleotide RNA can exhibit 4⁹⁰ nanostructures, and its loops and tertiary architecture can serve as a mounting dovetail that eliminates the need for external linking dowels. Its enzymatic and fluorogenic activity creates diversity in computational design. Varieties of small RNA can work cooperatively, synergistically or antagonistically to carry out computational logic circuits. The riboswitch and enzymatic ribozyme activities and its special in vivo attributes offer a great potential for in vivo computation. Unique features in transcription, termination, self-assembly, self-processing and acid resistance enable in vivo production of RNA nanoparticles that harbour various regulators for intracellular manipulation. With all these advantages, RNA computation is promising, but it is still in its infancy. Many challenges still exist. Collaborations between RNA nanotechnologists and computer scientists are necessary to advance this nascent technology.
Changing the face of science: Lessons from the 2017 Science-A-Thon
NASA Astrophysics Data System (ADS)
Barnes, R. T.; Licker, R.; Burt, M. A.; Holloway, T.
2017-12-01
Studies have shown that over two-thirds of Americans cannot name a living scientist. This disconnect is a concern for science and scientists, considering the large role of public funding for science, and the importance of science in many policy issues. As a large-scale public outreach initiative and fundraiser, the Earth Science Women's Network (ESWN) launched "Science-A-Thon" on July 13, 2017. This "day of science" invited participants to share 12 photos over 12 hours of a day, including both personal routines and professional endeavors. Over 200 scientists participated, with the #DayofScience hashtag trending on Twitter for the day. Earth scientists represented the largest portion of participants, but the event engaged cancer biologists, computer scientists, and more, including scientists from more than 10 countries. Science-A-Thon builds on the success and visibility of other social media campaigns, such as #actuallivingscientist and #DresslikeaWoman. Importantly these efforts share a common goal, by providing diverse images of scientists we can shift the public perception of who a scientist is and what science looks like in the real world. This type of public engagement offers a wide range of potential role models for students, and individual stories to increase public engagement with science. Social media campaigns such as this shift the public perception of who scientists are, why they do what they do, and what they do each day. The actions and conversations emerging from Science-A-Thon included scientists talking about (1) their science and motivation, (2) the purpose and need for ESWN, and (3) why they chose to participate in this event increased the reach of a social media campaign and fundraiser.
Climate@Home: Crowdsourcing Climate Change Research
NASA Astrophysics Data System (ADS)
Xu, C.; Yang, C.; Li, J.; Sun, M.; Bambacus, M.
2011-12-01
Climate change deeply impacts human wellbeing. Significant amounts of resources have been invested in building super-computers that are capable of running advanced climate models, which help scientists understand climate change mechanisms, and predict its trend. Although climate change influences all human beings, the general public is largely excluded from the research. On the other hand, scientists are eagerly seeking communication mediums for effectively enlightening the public on climate change and its consequences. The Climate@Home project is devoted to connect the two ends with an innovative solution: crowdsourcing climate computing to the general public by harvesting volunteered computing resources from the participants. A distributed web-based computing platform will be built to support climate computing, and the general public can 'plug-in' their personal computers to participate in the research. People contribute the spare computing power of their computers to run a computer model, which is used by scientists to predict climate change. Traditionally, only super-computers could handle such a large computing processing load. By orchestrating massive amounts of personal computers to perform atomized data processing tasks, investments on new super-computers, energy consumed by super-computers, and carbon release from super-computers are reduced. Meanwhile, the platform forms a social network of climate researchers and the general public, which may be leveraged to raise climate awareness among the participants. A portal is to be built as the gateway to the climate@home project. Three types of roles and the corresponding functionalities are designed and supported. The end users include the citizen participants, climate scientists, and project managers. Citizen participants connect their computing resources to the platform by downloading and installing a computing engine on their personal computers. Computer climate models are defined at the server side. Climate scientists configure computer model parameters through the portal user interface. After model configuration, scientists then launch the computing task. Next, data is atomized and distributed to computing engines that are running on citizen participants' computers. Scientists will receive notifications on the completion of computing tasks, and examine modeling results via visualization modules of the portal. Computing tasks, computing resources, and participants are managed by project managers via portal tools. A portal prototype has been built for proof of concept. Three forums have been setup for different groups of users to share information on science aspect, technology aspect, and educational outreach aspect. A facebook account has been setup to distribute messages via the most popular social networking platform. New treads are synchronized from the forums to facebook. A mapping tool displays geographic locations of the participants and the status of tasks on each client node. A group of users have been invited to test functions such as forums, blogs, and computing resource monitoring.
Skel: Generative Software for Producing Skeletal I/O Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Logan, J.; Klasky, S.; Lofstead, J.
2011-01-01
Massively parallel computations consist of a mixture of computation, communication, and I/O. As part of the co-design for the inevitable progress towards exascale computing, we must apply lessons learned from past work to succeed in this new age of computing. Of the three components listed above, implementing an effective parallel I/O solution has often been overlooked by application scientists and was usually added to large scale simulations only when existing serial techniques had failed. As scientists teams scaled their codes to run on hundreds of processors, it was common to call on an I/O expert to implement a set ofmore » more scalable I/O routines. These routines were easily separated from the calculations and communication, and in many cases, an I/O kernel was derived from the application which could be used for testing I/O performance independent of the application. These I/O kernels developed a life of their own used as a broad measure for comparing different I/O techniques. Unfortunately, as years passed and computation and communication changes required changes to the I/O, the separate I/O kernel used for benchmarking remained static no longer providing an accurate indicator of the I/O performance of the simulation making I/O research less relevant for the application scientists. In this paper we describe a new approach to this problem where I/O kernels are replaced with skeletal I/O applications automatically generated from an abstract set of simulation I/O parameters. We realize this abstraction by leveraging the ADIOS middleware's XML I/O specification with additional runtime parameters. Skeletal applications offer all of the benefits of I/O kernels including allowing I/O optimizations to focus on useful I/O patterns. Moreover, since they are automatically generated, it is easy to produce an updated I/O skeleton whenever the simulation's I/O changes. In this paper we analyze the performance of automatically generated I/O skeletal applications for the S3D and GTS codes. We show that these skeletal applications achieve performance comparable to that of the production applications. We wrap up the paper with a discussion of future changes to make the skeletal application better approximate the actual I/O performed in the simulation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lingerfelt, Eric J; Endeve, Eirik; Hui, Yawei
Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now--with the rise of multimodal acquisition systems and the associated processing capability--the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalablemore » data analysis and simulation and manage uploaded data files via an intuitive, cross-platform client user interface. This framework delivers authenticated, "push-button" execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing compute-and-data cloud infrastructures and HPC environments like Titan at the Oak Ridge Leadershp Computing Facility (OLCF).« less
NASA Astrophysics Data System (ADS)
Hutson, Matthew
2018-05-01
In their adaptability, young children demonstrate common sense, a kind of intelligence that, so far, computer scientists have struggled to reproduce. Gary Marcus, a developmental cognitive scientist at New York University in New York City, believes the field of artificial intelligence (AI) would do well to learn lessons from young thinkers. Researchers in machine learning argue that computers trained on mountains of data can learn just about anything—including common sense—with few, if any, programmed rules. But Marcus says computer scientists are ignoring decades of work in the cognitive sciences and developmental psychology showing that humans have innate abilities—programmed instincts that appear at birth or in early childhood—that help us think abstractly and flexibly. He believes AI researchers ought to include such instincts in their programs. Yet many computer scientists, riding high on the successes of machine learning, are eagerly exploring the limits of what a naïve AI can do. Computer scientists appreciate simplicity and have an aversion to debugging complex code. Furthermore, big companies such as Facebook and Google are pushing AI in this direction. These companies are most interested in narrowly defined, near-term problems, such as web search and facial recognition, in which blank-slate AI systems can be trained on vast data sets and work remarkably well. But in the longer term, computer scientists expect AIs to take on much tougher tasks that require flexibility and common sense. They want to create chatbots that explain the news, autonomous taxis that can handle chaotic city traffic, and robots that nurse the elderly. Some computer scientists are already trying. Such efforts, researchers hope, will result in AIs that sit somewhere between pure machine learning and pure instinct. They will boot up following some embedded rules, but will also learn as they go.
Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhary, Alok
Computational scientists must understand results from experimental, observational and computational simulation generated data to gain insights and perform knowledge discovery. As systems approach the petascale range, problems that were unimaginable a few years ago are within reach. With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive I/O, storage, acceleration of data manipulation, analysis, and mining tools. Scientists require techniques, tools and infrastructure to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis, statistical analysis and knowledgemore » discovery. The goal of this work is to enable more effective analysis of scientific datasets through the integration of enhancements in the I/O stack, from active storage support at the file system layer to MPI-IO and high-level I/O library layers. We propose to provide software components to accelerate data analytics, mining, I/O, and knowledge discovery for large-scale scientific applications, thereby increasing productivity of both scientists and the systems. Our approaches include 1) design the interfaces in high-level I/O libraries, such as parallel netCDF, for applications to activate data mining operations at the lower I/O layers; 2) Enhance MPI-IO runtime systems to incorporate the functionality developed as a part of the runtime system design; 3) Develop parallel data mining programs as part of runtime library for server-side file system in PVFS file system; and 4) Prototype an active storage cluster, which will utilize multicore CPUs, GPUs, and FPGAs to carry out the data mining workload.« less
NASA's Pleiades Supercomputer Crunches Data For Groundbreaking Analysis and Visualizations
2016-11-23
The Pleiades supercomputer at NASA's Ames Research Center, recently named the 13th fastest computer in the world, provides scientists and researchers high-fidelity numerical modeling of complex systems and processes. By using detailed analyses and visualizations of large-scale data, Pleiades is helping to advance human knowledge and technology, from designing the next generation of aircraft and spacecraft to understanding the Earth's climate and the mysteries of our galaxy.
1985-12-01
Office of Scientific Research , and Air Force Space Division are sponsoring research for the development of a high speed DFT processor. This DFT...to the arithmetic circuitry through a master/slave 11-15 %v OPR ONESHOT OUTPUT OUTPUT .., ~ INITIALIZATION COLUMN’ 00 N DONE CUTRPLANE PLAtNE Figure...Since the TSP is an NP-complete problem, many mathematicians, operations researchers , computer scientists and the like have proposed heuristic
Facilities | Computational Science | NREL
technology innovation by providing scientists and engineers the ability to tackle energy challenges that scientists and engineers to take full advantage of advanced computing hardware and software resources
ERIC Educational Resources Information Center
Murfin, Brian
1994-01-01
Reports on a study of the effectiveness of computer-mediated communication (CMC) in providing African American and female middle school students with scientist role models. Quantitative and qualitative data gathered by analyzing messages students and scientists posted on a shared electronic bulletin board showed that CMC could be an effective…
SPAGHETTILENS: A software stack for modeling gravitational lenses by citizen scientists
NASA Astrophysics Data System (ADS)
Küng, R.
2018-04-01
The 2020s are expected to see tens of thousands of lens discoveries. Mass reconstruction or modeling of these lenses will be needed, but current modeling methods are time intensive for specialists and expert human resources do not scale. SpaghettiLens approaches this challenge with the help of experienced citizen scientist volunteers who have already been involved in finding lenses. A top level description is as follows. Citizen scientists look at data and provide a graphical input based on Fermat's principle which we call a Spaghetti Diagram. This input works as a model configuration. It is followed by the generation of the model, which is a compute intensive task done server side though a task distribution system. Model results are returned in graphical form to the citizen scientist, who examines and then either forwards them for forum discussion or rejects the model and retries. As well as configuring models, citizen scientists can also modify existing model configurations, which results in a version tree of models and makes the modeling process collaborative. SpaghettiLens is designed to be scalable and could be adopted to problems with similar characteristics. It is licensed under the MIT license, released at http://labs.spacewarps.org and the source code is available at https://github.com/RafiKueng/SpaghettiLens.
Scout: high-performance heterogeneous computing made simple
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice
2011-01-26
Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less
ERIC Educational Resources Information Center
Loesch, Martha Fallahay
2011-01-01
Two members of the library faculty at Seton Hall University teamed up with a respected professor of mathematics and computer science, in order to create an online course that introduces information literacy both from the perspectives of the computer scientist and from the instruction librarian. This collaboration is unique in that it addresses the…
Remote control of nanoscale devices
NASA Astrophysics Data System (ADS)
Högberg, Björn
2018-01-01
Processes that occur at the nanometer scale have a tremendous impact on our daily lives. Sophisticated evolved nanomachines operate in each of our cells; we also, as a society, increasingly rely on synthetic nanodevices for communication and computation. Scientists are still only beginning to master this scale, but, recently, DNA nanotechnology (1)—in particular, DNA origami (2)—has emerged as a powerful tool to build structures precise enough to help us do so. On page 296 of this issue, Kopperger et al. (3) show that they are now also able to control the motion of a DNA origami device from the outside by applying electric fields.
Enabling drug discovery project decisions with integrated computational chemistry and informatics
NASA Astrophysics Data System (ADS)
Tsui, Vickie; Ortwine, Daniel F.; Blaney, Jeffrey M.
2017-03-01
Computational chemistry/informatics scientists and software engineers in Genentech Small Molecule Drug Discovery collaborate with experimental scientists in a therapeutic project-centric environment. Our mission is to enable and improve pre-clinical drug discovery design and decisions. Our goal is to deliver timely data, analysis, and modeling to our therapeutic project teams using best-in-class software tools. We describe our strategy, the organization of our group, and our approaches to reach this goal. We conclude with a summary of the interdisciplinary skills required for computational scientists and recommendations for their training.
Rich client data exploration and research prototyping for NOAA
NASA Astrophysics Data System (ADS)
Grossberg, Michael; Gladkova, Irina; Guch, Ingrid; Alabi, Paul; Shahriar, Fazlul; Bonev, George; Aizenman, Hannah
2009-08-01
Data from satellites and model simulations is increasing exponentially as observations and model computing power improve rapidly. Not only is technology producing more data, but it often comes from sources all over the world. Researchers and scientists who must collaborate are also located globally. This work presents a software design and technologies which will make it possible for groups of researchers to explore large data sets visually together without the need to download these data sets locally. The design will also make it possible to exploit high performance computing remotely and transparently to analyze and explore large data sets. Computer power, high quality sensing, and data storage capacity have improved at a rate that outstrips our ability to develop software applications that exploit these resources. It is impractical for NOAA scientists to download all of the satellite and model data that may be relevant to a given problem and the computing environments available to a given researcher range from supercomputers to only a web browser. The size and volume of satellite and model data are increasing exponentially. There are at least 50 multisensor satellite platforms collecting Earth science data. On the ground and in the sea there are sensor networks, as well as networks of ground based radar stations, producing a rich real-time stream of data. This new wealth of data would have limited use were it not for the arrival of large-scale high-performance computation provided by parallel computers, clusters, grids, and clouds. With these computational resources and vast archives available, it is now possible to analyze subtle relationships which are global, multi-modal and cut across many data sources. Researchers, educators, and even the general public, need tools to access, discover, and use vast data center archives and high performance computing through a simple yet flexible interface.
Runtime visualization of the human arterial tree.
Insley, Joseph A; Papka, Michael E; Dong, Suchuan; Karniadakis, George; Karonis, Nicholas T
2007-01-01
Large-scale simulation codes typically execute for extended periods of time and often on distributed computational resources. Because these simulations can run for hours, or even days, scientists like to get feedback about the state of the computation and the validity of its results as it runs. It is also important that these capabilities be made available with little impact on the performance and stability of the simulation. Visualizing and exploring data in the early stages of the simulation can help scientists identify problems early, potentially avoiding a situation where a simulation runs for several days, only to discover that an error with an input parameter caused both time and resources to be wasted. We describe an application that aids in the monitoring and analysis of a simulation of the human arterial tree. The application provides researchers with high-level feedback about the state of the ongoing simulation and enables them to investigate particular areas of interest in greater detail. The application also offers monitoring information about the amount of data produced and data transfer performance among the various components of the application.
NASA Astrophysics Data System (ADS)
Mezzacappa, Anthony
2005-01-01
On 26-30 June 2005 at the Grand Hyatt on Union Square in San Francisco several hundred computational scientists from around the world came together for what can certainly be described as a celebration of computational science. Scientists from the SciDAC Program and scientists from other agencies and nations were joined by applied mathematicians and computer scientists to highlight the many successes in the past year where computation has led to scientific discovery in a variety of fields: lattice quantum chromodynamics, accelerator modeling, chemistry, biology, materials science, Earth and climate science, astrophysics, and combustion and fusion energy science. Also highlighted were the advances in numerical methods and computer science, and the multidisciplinary collaboration cutting across science, mathematics, and computer science that enabled these discoveries. The SciDAC Program was conceived and funded by the US Department of Energy Office of Science. It is the Office of Science's premier computational science program founded on what is arguably the perfect formula: the priority and focus is science and scientific discovery, with the understanding that the full arsenal of `enabling technologies' in applied mathematics and computer science must be brought to bear if we are to have any hope of attacking and ultimately solving today's computational Grand Challenge problems. The SciDAC Program has been in existence for four years, and many of the computational scientists funded by this program will tell you that the program has given them the hope of addressing their scientific problems in full realism for the very first time. Many of these scientists will also tell you that SciDAC has also fundamentally changed the way they do computational science. We begin this volume with one of DOE's great traditions, and core missions: energy research. As we will see, computation has been seminal to the critical advances that have been made in this arena. Of course, to understand our world, whether it is to understand its very nature or to understand it so as to control it for practical application, will require explorations on all of its scales. Computational science has been no less an important tool in this arena than it has been in the arena of energy research. From explorations of quantum chromodynamics, the fundamental theory that describes how quarks make up the protons and neutrons of which we are composed, to explorations of the complex biomolecules that are the building blocks of life, to explorations of some of the most violent phenomena in our universe and of the Universe itself, computation has provided not only significant insight, but often the only means by which we have been able to explore these complex, multicomponent systems and by which we have been able to achieve scientific discovery and understanding. While our ultimate target remains scientific discovery, it certainly can be said that at a fundamental level the world is mathematical. Equations ultimately govern the evolution of the systems of interest to us, be they physical, chemical, or biological systems. The development and choice of discretizations of these underlying equations is often a critical deciding factor in whether or not one is able to model such systems stably, faithfully, and practically, and in turn, the algorithms to solve the resultant discrete equations are the complementary, critical ingredient in the recipe to model the natural world. The use of parallel computing platforms, especially at the TeraScale, and the trend toward even larger numbers of processors, continue to present significant challenges in the development and implementation of these algorithms. Computational scientists often speak of their `workflows'. A workflow, as the name suggests, is the sum total of all complex and interlocking tasks, from simulation set up, execution, and I/O, to visualization and scientific discovery, through which the advancement in our understanding of the natural world is realized. For the computational scientist, enabling such workflows presents myriad, signiflcant challenges, and it is computer scientists that are called upon at such times to address these challenges. Simulations are currently generating data at the staggering rate of tens of TeraBytes per simulation, over the course of days. In the next few years, these data generation rates are expected to climb exponentially to hundreds of TeraBytes per simulation, performed over the course of months. The output, management, movement, analysis, and visualization of these data will be our key to unlocking the scientific discoveries buried within the data. And there is no hope of generating such data to begin with, or of scientific discovery, without stable computing platforms and a sufficiently high and sustained performance of scientific applications codes on them. Thus, scientific discovery in the realm of computational science at the TeraScale and beyond will occur at the intersection of science, applied mathematics, and computer science. The SciDAC Program was constructed to mirror this reality, and the pages that follow are a testament to the efficacy of such an approach. We would like to acknowledge the individuals on whose talents and efforts the success of SciDAC 2005 was based. Special thanks go to Betsy Riley for her work on the SciDAC 2005 Web site and meeting agenda, for lining up our corporate sponsors, for coordinating all media communications, and for her efforts in processing the proceedings contributions, to Sherry Hempfling for coordinating the overall SciDAC 2005 meeting planning, for handling a significant share of its associated communications, and for coordinating with the ORNL Conference Center and Grand Hyatt, to Angela Harris for producing many of the documents and records on which our meeting planning was based and for her efforts in coordinating with ORNL Graphics Services, to Angie Beach of the ORNL Conference Center for her efforts in procurement and setting up and executing the contracts with the hotel, and to John Bui and John Smith for their superb wireless networking and A/V set up and support. We are grateful for the relentless efforts of all of these individuals, their remarkable talents, and for the joy of working with them during this past year. They were the cornerstones of SciDAC 2005. Thanks also go to Kymba A'Hearn and Patty Boyd for on-site registration, Brittany Hagen for administrative support, Bruce Johnston for netcast support, Tim Jones for help with the proceedings and Web site, Sherry Lamb for housing and registration, Cindy Lathum for Web site design, Carolyn Peters for on-site registration, and Dami Rich for graphic design. And we would like to express our appreciation to the Oak Ridge National Laboratory, especially Jeff Nichols, the Argonne National Laboratory, the Lawrence Berkeley National Laboratory, and to our corporate sponsors, Cray, IBM, Intel, and SGI, for their support. We would like to extend special thanks also to our plenary speakers, technical speakers, poster presenters, and panelists for all of their efforts on behalf of SciDAC 2005 and for their remarkable achievements and contributions. We would like to express our deep appreciation to Lali Chatterjee, Graham Douglas and Margaret Smith of Institute of Physics Publishing, who worked tirelessly in order to provide us with this finished volume within two months, which is nothing short of miraculous. Finally, we wish to express our heartfelt thanks to Michael Strayer, SciDAC Director, whose vision it was to focus SciDAC 2005 on scientific discovery, around which all of the excitement we experienced revolved, and to our DOE SciDAC program managers, especially Fred Johnson, for their support, input, and help throughout.
Harb, Omar S; Roos, David S
2015-01-01
Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
Provenance-Powered Automatic Workflow Generation and Composition
NASA Astrophysics Data System (ADS)
Zhang, J.; Lee, S.; Pan, L.; Lee, T. J.
2015-12-01
In recent years, scientists have learned how to codify tools into reusable software modules that can be chained into multi-step executable workflows. Existing scientific workflow tools, created by computer scientists, require domain scientists to meticulously design their multi-step experiments before analyzing data. However, this is oftentimes contradictory to a domain scientist's daily routine of conducting research and exploration. We hope to resolve this dispute. Imagine this: An Earth scientist starts her day applying NASA Jet Propulsion Laboratory (JPL) published climate data processing algorithms over ARGO deep ocean temperature and AMSRE sea surface temperature datasets. Throughout the day, she tunes the algorithm parameters to study various aspects of the data. Suddenly, she notices some interesting results. She then turns to a computer scientist and asks, "can you reproduce my results?" By tracking and reverse engineering her activities, the computer scientist creates a workflow. The Earth scientist can now rerun the workflow to validate her findings, modify the workflow to discover further variations, or publish the workflow to share the knowledge. In this way, we aim to revolutionize computer-supported Earth science. We have developed a prototyping system to realize the aforementioned vision, in the context of service-oriented science. We have studied how Earth scientists conduct service-oriented data analytics research in their daily work, developed a provenance model to record their activities, and developed a technology to automatically generate workflow starting from user behavior and adaptability and reuse of these workflows for replicating/improving scientific studies. A data-centric repository infrastructure is established to catch richer provenance to further facilitate collaboration in the science community. We have also established a Petri nets-based verification instrument for provenance-based automatic workflow generation and recommendation.
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.
Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco
2013-01-01
Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists
Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco
2013-01-01
Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior. PMID:23653617
Distributed computing testbed for a remote experimental environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Butner, D.N.; Casper, T.A.; Howard, B.C.
1995-09-18
Collaboration is increasing as physics research becomes concentrated on a few large, expensive facilities, particularly in magnetic fusion energy research, with national and international participation. These facilities are designed for steady state operation and interactive, real-time experimentation. We are developing tools to provide for the establishment of geographically distant centers for interactive operations; such centers would allow scientists to participate in experiments from their home institutions. A testbed is being developed for a Remote Experimental Environment (REE), a ``Collaboratory.`` The testbed will be used to evaluate the ability of a remotely located group of scientists to conduct research on themore » DIII-D Tokamak at General Atomics. The REE will serve as a testing environment for advanced control and collaboration concepts applicable to future experiments. Process-to-process communications over high speed wide area networks provide real-time synchronization and exchange of data among multiple computer networks, while the ability to conduct research is enhanced by adding audio/video communication capabilities. The Open Software Foundation`s Distributed Computing Environment is being used to test concepts in distributed control, security, naming, remote procedure calls and distributed file access using the Distributed File Services. We are exploring the technology and sociology of remotely participating in the operation of a large scale experimental facility.« less
Center for computation and visualization of geometric structures. Final report, 1992 - 1995
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1995-11-01
This report describes the overall goals and the accomplishments of the Geometry Center of the University of Minnesota, whose mission is to develop, support, and promote computational tools for visualizing geometric structures, for facilitating communication among mathematical and computer scientists and between these scientists and the public at large, and for stimulating research in geometry.
A Look Inside Argonne's Center for Nanoscale Materials
Divan, Ralu; Rosenthal, Dan; Rose, Volker; Wai Hla
2018-05-23
At a very small, or "nano" scale, materials behave differently. The study of nanomaterials is much more than miniaturization - scientists are discovering how changes in size change a material's properties. From sunscreen to computer memory, the applications of nanoscale materials research are all around us. Researchers at Argonne's Center for Nanoscale Materials are creating new materials, methods and technologies to address some of the world's greatest challenges in energy security, lightweight but durable materials, high-efficiency lighting, information storage, environmental stewardship and advanced medical devices.
Opportunities and challenges of big data for the social sciences: The case of genomic data.
Liu, Hexuan; Guo, Guang
2016-09-01
In this paper, we draw attention to one unique and valuable source of big data, genomic data, by demonstrating the opportunities they provide to social scientists. We discuss different types of large-scale genomic data and recent advances in statistical methods and computational infrastructure used to address challenges in managing and analyzing such data. We highlight how these data and methods can be used to benefit social science research. Copyright © 2016 Elsevier Inc. All rights reserved.
What do computer scientists tweet? Analyzing the link-sharing practice on Twitter.
Schmitt, Marco; Jäschke, Robert
2017-01-01
Twitter communication has permeated every sphere of society. To highlight and share small pieces of information with possibly vast audiences or small circles of the interested has some value in almost any aspect of social life. But what is the value exactly for a scientific field? We perform a comprehensive study of computer scientists using Twitter and their tweeting behavior concerning the sharing of web links. Discerning the domains, hosts and individual web pages being tweeted and the differences between computer scientists and a Twitter sample enables us to look in depth at the Twitter-based information sharing practices of a scientific community. Additionally, we aim at providing a deeper understanding of the role and impact of altmetrics in computer science and give a glance at the publications mentioned on Twitter that are most relevant for the computer science community. Our results show a link sharing culture that concentrates more heavily on public and professional quality information than the Twitter sample does. The results also show a broad variety in linked sources and especially in linked publications with some publications clearly related to community-specific interests of computer scientists, while others with a strong relation to attention mechanisms in social media. This refers to the observation that Twitter is a hybrid form of social media between an information service and a social network service. Overall the computer scientists' style of usage seems to be more on the information-oriented side and to some degree also on professional usage. Therefore, altmetrics are of considerable use in analyzing computer science.
New Frontiers in Analyzing Dynamic Group Interactions: Bridging Social and Computer Science
Lehmann-Willenbrock, Nale; Hung, Hayley; Keyton, Joann
2017-01-01
This special issue on advancing interdisciplinary collaboration between computer scientists and social scientists documents the joint results of the international Lorentz workshop, “Interdisciplinary Insights into Group and Team Dynamics,” which took place in Leiden, The Netherlands, July 2016. An equal number of scholars from social and computer science participated in the workshop and contributed to the papers included in this special issue. In this introduction, we first identify interaction dynamics as the core of group and team models and review how scholars in social and computer science have typically approached behavioral interactions in groups and teams. Next, we identify key challenges for interdisciplinary collaboration between social and computer scientists, and we provide an overview of the different articles in this special issue aimed at addressing these challenges. PMID:29249891
New Frontiers in Analyzing Dynamic Group Interactions: Bridging Social and Computer Science.
Lehmann-Willenbrock, Nale; Hung, Hayley; Keyton, Joann
2017-10-01
This special issue on advancing interdisciplinary collaboration between computer scientists and social scientists documents the joint results of the international Lorentz workshop, "Interdisciplinary Insights into Group and Team Dynamics," which took place in Leiden, The Netherlands, July 2016. An equal number of scholars from social and computer science participated in the workshop and contributed to the papers included in this special issue. In this introduction, we first identify interaction dynamics as the core of group and team models and review how scholars in social and computer science have typically approached behavioral interactions in groups and teams. Next, we identify key challenges for interdisciplinary collaboration between social and computer scientists, and we provide an overview of the different articles in this special issue aimed at addressing these challenges.
NASA Astrophysics Data System (ADS)
Podrasky, A.; Covitt, B. A.; Woessner, W.
2017-12-01
The availability of clean water to support human uses and ecological integrity has become an urgent interest for many scientists, decision makers and citizens. Likewise, as computational capabilities increasingly revolutionize and become integral to the practice of science, technology, engineering and math (STEM) disciplines, the STEM+ Computing (STEM+C) Partnerships program seeks to integrate the use of computational approaches in K-12 STEM teaching and learning. The Comp Hydro project, funded by a STEM+C grant from the National Science Foundation, brings together a diverse team of scientists, educators, professionals and citizens at sites in Arizona, Colorado, Maryland and Montana to foster water literacy, as well as computational science literacy, by integrating authentic, place- and data- based learning using physical, mathematical, computational and conceptual models. This multi-state project is currently engaging four teams of six teachers who work during two academic years with educators and scientists at each site. Teams work to develop instructional units specific to their region that integrate hydrologic science and computational modeling. The units, currently being piloted in high school earth and environmental science classes, provide a classroom context to investigate student understanding of how computation is used in Earth systems science. To develop effective science instruction that is rich in place- and data- based learning, effective collaborations between researchers, educators, scientists, professionals and citizens are crucial. In this poster, we focus on project implementation in Montana, where an instructional unit has been developed and is being tested through collaboration among University scientists, researchers and educators, high school teachers and agency and industry scientists and engineers. In particular, we discuss three characteristics of effective collaborative science education design for developing and implementing place- and data- based science education to support students in developing socio-scientific and computational literacy sufficient for making decisions about real world issues such as groundwater contamination. These characteristics include that science education experiences are real, responsive/accessible and rigorous.
Accelerating scientific discovery : 2007 annual report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beckman, P.; Dave, P.; Drugan, C.
2008-11-14
As a gateway for scientific discovery, the Argonne Leadership Computing Facility (ALCF) works hand in hand with the world's best computational scientists to advance research in a diverse span of scientific domains, ranging from chemistry, applied mathematics, and materials science to engineering physics and life sciences. Sponsored by the U.S. Department of Energy's (DOE) Office of Science, researchers are using the IBM Blue Gene/L supercomputer at the ALCF to study and explore key scientific problems that underlie important challenges facing our society. For instance, a research team at the University of California-San Diego/ SDSC is studying the molecular basis ofmore » Parkinson's disease. The researchers plan to use the knowledge they gain to discover new drugs to treat the disease and to identify risk factors for other diseases that are equally prevalent. Likewise, scientists from Pratt & Whitney are using the Blue Gene to understand the complex processes within aircraft engines. Expanding our understanding of jet engine combustors is the secret to improved fuel efficiency and reduced emissions. Lessons learned from the scientific simulations of jet engine combustors have already led Pratt & Whitney to newer designs with unprecedented reductions in emissions, noise, and cost of ownership. ALCF staff members provide in-depth expertise and assistance to those using the Blue Gene/L and optimizing user applications. Both the Catalyst and Applications Performance Engineering and Data Analytics (APEDA) teams support the users projects. In addition to working with scientists running experiments on the Blue Gene/L, we have become a nexus for the broader global community. In partnership with the Mathematics and Computer Science Division at Argonne National Laboratory, we have created an environment where the world's most challenging computational science problems can be addressed. Our expertise in high-end scientific computing enables us to provide guidance for applications that are transitioning to petascale as well as to produce software that facilitates their development, such as the MPICH library, which provides a portable and efficient implementation of the MPI standard--the prevalent programming model for large-scale scientific applications--and the PETSc toolkit that provides a programming paradigm that eases the development of many scientific applications on high-end computers.« less
National Climate Change and Wildlife Science Center project accomplishments: highlights
Holl, Sally
2011-01-01
The National Climate Change and Wildlife Science Center (NCCWSC) has invested more than $20M since 2008 to put cutting-edge climate science research in the hands of resource managers across the Nation. With NCCWSC support, more than 25 cooperative research initiatives led by U.S. Geological Survey (USGS) researchers and technical staff are advancing our understanding of habitats and species to provide guidance to managers in the face of a changing climate. Projects focus on quantifying and predicting interactions between climate, habitats, species, and other natural resources such as water. Spatial scales of the projects range from the continent of North America, to a regional scale such as the Pacific Northwest United States, to a landscape scale such as the Florida Everglades. Time scales range from the outset of the 20th century to the end of the 21st century. Projects often lead to workshops, presentations, publications and the creation of new websites, computer models, and data visualization tools. Partnership-building is also a key focus of the NCCWSC-supported projects. New and on-going cooperative partnerships have been forged and strengthened with resource managers and scientists at Federal, tribal, state, local, academic, and non-governmental organizations. USGS scientists work closely with resource managers to produce timely and relevant results that can assist managers and policy makers in current resource management decisions. This fact sheet highlights accomplishments of five NCCWSC projects.
An efficient approach to imaging underground hydraulic networks
NASA Astrophysics Data System (ADS)
Kumar, Mohi
2012-07-01
To better locate natural resources, treat pollution, and monitor underground networks associated with geothermal plants, nuclear waste repositories, and carbon dioxide sequestration sites, scientists need to be able to accurately characterize and image fluid seepage pathways below ground. With these images, scientists can gain knowledge of soil moisture content, the porosity of geologic formations, concentrations and locations of dissolved pollutants, and the locations of oil fields or buried liquid contaminants. Creating images of the unknown hydraulic environments underfoot is a difficult task that has typically relied on broad extrapolations from characteristics and tests of rock units penetrated by sparsely positioned boreholes. Such methods, however, cannot identify small-scale features and are very expensive to reproduce over a broad area. Further, the techniques through which information is extrapolated rely on clunky and mathematically complex statistical approaches requiring large amounts of computational power.
NASA Technical Reports Server (NTRS)
VanZandt, John
1994-01-01
The usage model of supercomputers for scientific applications, such as computational fluid dynamics (CFD), has changed over the years. Scientific visualization has moved scientists away from looking at numbers to looking at three-dimensional images, which capture the meaning of the data. This change has impacted the system models for computing. This report details the model which is used by scientists at NASA's research centers.
NASA Astrophysics Data System (ADS)
Harfst, S.; Portegies Zwart, S.; McMillan, S.
2008-12-01
We present MUSE, a software framework for combining existing computational tools from different astrophysical domains into a single multi-physics, multi-scale application. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly-coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for studying generalized stellar systems. We have now reached a ``Noah's Ark'' milestone, with (at least) two available numerical solvers for each domain. MUSE can treat multi-scale and multi-physics systems in which the time- and size-scales are well separated, like simulating the evolution of planetary systems, small stellar associations, dense stellar clusters, galaxies and galactic nuclei. In this paper we describe two examples calculated using MUSE: the merger of two galaxies and an N-body simulation with live stellar evolution. In addition, we demonstrate an implementation of MUSE on a distributed computer which may also include special-purpose hardware, such as GRAPEs or GPUs, to accelerate computations. The current MUSE code base is publicly available as open source at http://muse.li.
2013 R&D 100 Award: âMiniappsâ Bolster High Performance Computing
Belak, Jim; Richards, David
2018-06-12
Two Livermore computer scientists served on a Sandia National Laboratories-led team that developed Mantevo Suite 1.0, the first integrated suite of small software programs, also called "miniapps," to be made available to the high performance computing (HPC) community. These miniapps facilitate the development of new HPC systems and the applications that run on them. Miniapps (miniature applications) serve as stripped down surrogates for complex, full-scale applications that can require a great deal of time and effort to port to a new HPC system because they often consist of hundreds of thousands of lines of code. The miniapps are a prototype that contains some or all of the essentials of the real application but with many fewer lines of code, making the miniapp more versatile for experimentation. This allows researchers to more rapidly explore options and optimize system design, greatly improving the chances the full-scale application will perform successfully. These miniapps have become essential tools for exploring complex design spaces because they can reliably predict the performance of full applications.
NASA Astrophysics Data System (ADS)
Hampton, S. E.
2015-12-01
The science necessary to unravel complex environmental problems confronts severe computational challenges - coping with huge volumes of heterogeneous data, spanning vast spatial scales at high resolution, and requiring integration of disparate measurements from multiple disciplines. But as cyberinfrastructure advances to support such work, scientists in many fields lack sufficient computational skills to participate in interdisciplinary, data-intensive research. In response, we developed innovative training workshops for early-career scientists, in order to explore both the needs and solutions for training next-generation scientists in skills for data-intensive environmental research. In 2013 and 2014 we ran intensive 3-week training workshops for early-career researchers. One of the workshops was run concurrently in California and North Carolina, connected by virtual technologies and coordinated schedules. We attracted applicants to the workshop with the opportunity to pursue data-intensive small-group research projects that they proposed. This approach presented a realistic possibility that publishable products could result from 3 weeks of focused hands-on classroom instruction combined with self-directed group research in which instructors were present to assist trainees. Instruction addressed 1) collaboration modes and technologies, 2) data management, preservation, and sharing, 3) preparing data for analysis using scripting, 4) reproducible research, 5) sustainable software practices, 6) data analysis and modeling, and 7) communicating results to broad communities. The most dramatic improvements in technical skills were in data management, version control, and working with spatial data outside of proprietary software. In addition, participants built strong networks and collaborative skills that later resulted in a successful student-led grant proposal, published manuscripts, and participants reported that the training was a highly influential experience.
Message from the ISCB: 2015 ISCB Accomplishment by a Senior Scientist Award: Cyrus Chothia.
Fogg, Christiana N; Kovats, Diane E
2015-07-01
The International Society for Computational Biology (ISCB; http://www.iscb.org) honors a senior scientist annually for his or her outstanding achievements with the ISCB Accomplishment by a Senior Scientist Award. This award recognizes a leader in the field of computational biology for his or her significant contributions to the community through research, service and education. Cyrus Chothia, an emeritus scientist at the Medical Research Council Laboratory of Molecular Biology and emeritus fellow of Wolfson College at Cambridge University, England, is the 2015 ISCB Accomplishment by a Senior Scientist Award winner.Chothia was selected by the Awards Committee, which is chaired by Dr Bonnie Berger of the Massachusetts Institute of Technology. He will receive his award and deliver a keynote presentation at 2015 Intelligent Systems for Molecular Biology/European Conference on Computational Biology in Dublin, Ireland, in July 2015. dkovats@iscb.org. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Improving Data Mobility & Management for International Cosmology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borrill, Julian; Dart, Eli; Gore, Brooklin
In February 2015 the third workshop in the CrossConnects series, with a focus on Improving Data Mobility & Management for International Cosmology, was held at Lawrence Berkeley National Laboratory. Scientists from fields including astrophysics, cosmology, and astronomy collaborated with experts in computing and networking to outline strategic opportunities for enhancing scientific productivity and effectively managing the ever-increasing scale of scientific data. While each field has unique details which depend on the instruments employed, the type and scale of the data, and the structure of scientific collaborations, several important themes emerged from the workshop discussions. Findings, as well as a setmore » of recommendations, are contained in their respective sections in this report.« less
Enabling Earth Science: The Facilities and People of the NCCS
NASA Technical Reports Server (NTRS)
2002-01-01
The NCCS's mass data storage system allows scientists to store and manage the vast amounts of data generated by these computations, and its high-speed network connections allow the data to be accessed quickly from the NCCS archives. Some NCCS users perform studies that are directly related to their ability to run computationally expensive and data-intensive simulations. Because the number and type of questions scientists research often are limited by computing power, the NCCS continually pursues the latest technologies in computing, mass storage, and networking technologies. Just as important as the processors, tapes, and routers of the NCCS are the personnel who administer this hardware, create and manage accounts, maintain security, and assist the scientists, often working one on one with them.
Blazing Signature Filter: a library for fast pairwise similarity comparisons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Joon-Yong; Fujimoto, Grant M.; Wilson, Ryan
Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. A significant practical drawback of large-scale data mining is the vast majoritymore » of pairwise comparisons are unlikely to be relevant, meaning that they do not share a signature of interest. It is therefore essential to efficiently identify these unproductive comparisons as rapidly as possible and exclude them from more time-intensive similarity calculations. The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. As a result, the BSF can scale to high dimensionality and rapidly filter unproductive pairwise comparison. Two bioinformatics applications of the tool are presented to demonstrate the ability to scale to billions of pairwise comparisons and the usefulness of this approach.« less
Large-scale gene function analysis with the PANTHER classification system.
Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D
2013-08-01
The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Big Data: An Opportunity for Collaboration with Computer Scientists on Data-Driven Science
NASA Astrophysics Data System (ADS)
Baru, C.
2014-12-01
Big data technologies are evolving rapidly, driven by the need to manage ever increasing amounts of historical data; process relentless streams of human and machine-generated data; and integrate data of heterogeneous structure from extremely heterogeneous sources of information. Big data is inherently an application-driven problem. Developing the right technologies requires an understanding of the applications domain. Though, an intriguing aspect of this phenomenon is that the availability of the data itself enables new applications not previously conceived of! In this talk, we will discuss how the big data phenomenon creates an imperative for collaboration among domain scientists (in this case, geoscientists) and computer scientists. Domain scientists provide the application requirements as well as insights about the data involved, while computer scientists help assess whether problems can be solved with currently available technologies or require adaptaion of existing technologies and/or development of new technologies. The synergy can create vibrant collaborations potentially leading to new science insights as well as development of new data technologies and systems. The area of interface between geosciences and computer science, also referred to as geoinformatics is, we believe, a fertile area for interdisciplinary research.
NASA Astrophysics Data System (ADS)
Gramelsberger, Gabriele
The scientific understanding of atmospheric processes has been rooted in the mechanical and physical view of nature ever since dynamic meteorology gained ground in the late 19th century. Conceiving the atmosphere as a giant 'air mass circulation engine' entails applying hydro- and thermodynamical theory to the subject in order to describe the atmosphere's behaviour on small scales. But when it comes to forecasting, it turns out that this view is far too complex to be computed. The limitation of analytical methods precludes an exact solution, forcing scientists to make use of numerical simulation. However, simulation introduces two prerequisites to meteorology: First, the partitioning of the theoretical view into two parts-the large-scale behaviour of the atmosphere, and the effects of smaller-scale processes on this large-scale behaviour, so-called parametrizations; and second, the dependency on computational power in order to achieve a higher resolution. The history of today's atmospheric circulation modelling can be reconstructed as the attempt to improve the handling of these basic constraints. It can be further seen as the old schism between theory and application under new circumstances, which triggers a new discussion about the question of how processes may be conceived in atmospheric modelling.
CREASE 6.0 Catalog of Resources for Education in Ada and Software Engineering
1992-02-01
Programming Software Engineering Strong Typing Tasking Audene . Computer Scientists Terbook(s): Barnes, J. Programming in Ada, 3rd ed. Addison-Wesley...Ada. Concept: Abstract Data Types Management Overview Package Real-Time Programming Tasking Audene Computer Scientists Textbook(s): Barnes, J
Parallel computing in genomic research: advances and applications
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801
Parallel computing in genomic research: advances and applications.
Ocaña, Kary; de Oliveira, Daniel
2015-01-01
Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
An economic and financial exploratory
NASA Astrophysics Data System (ADS)
Cincotti, S.; Sornette, D.; Treleaven, P.; Battiston, S.; Caldarelli, G.; Hommes, C.; Kirman, A.
2012-11-01
This paper describes the vision of a European Exploratory for economics and finance using an interdisciplinary consortium of economists, natural scientists, computer scientists and engineers, who will combine their expertise to address the enormous challenges of the 21st century. This Academic Public facility is intended for economic modelling, investigating all aspects of risk and stability, improving financial technology, and evaluating proposed regulatory and taxation changes. The European Exploratory for economics and finance will be constituted as a network of infrastructure, observatories, data repositories, services and facilities and will foster the creation of a new cross-disciplinary research community of social scientists, complexity scientists and computing (ICT) scientists to collaborate in investigating major issues in economics and finance. It is also considered a cradle for training and collaboration with the private sector to spur spin-offs and job creations in Europe in the finance and economic sectors. The Exploratory will allow Social Scientists and Regulators as well as Policy Makers and the private sector to conduct realistic investigations with real economic, financial and social data. The Exploratory will (i) continuously monitor and evaluate the status of the economies of countries in their various components, (ii) use, extend and develop a large variety of methods including data mining, process mining, computational and artificial intelligence and every other computer and complex science techniques coupled with economic theory and econometric, and (iii) provide the framework and infrastructure to perform what-if analysis, scenario evaluations and computational, laboratory, field and web experiments to inform decision makers and help develop innovative policy, market and regulation designs.
Award-Winning Animation Helps Scientists See Nature at Work | News | NREL
Scientists See Nature at Work August 8, 2008 A computer-aided image combines a photo of a man with a three -dimensional, computer-generated image. The man has long brown hair and a long beard. He is wearing a blue - simultaneously. "It is very difficult to parallelize the process to run even on a huge computer,"
Atomic Detail Visualization of Photosynthetic Membranes with GPU-Accelerated Ray Tracing
Vandivort, Kirby L.; Barragan, Angela; Singharoy, Abhishek; Teo, Ivan; Ribeiro, João V.; Isralewitz, Barry; Liu, Bo; Goh, Boon Chong; Phillips, James C.; MacGregor-Chatwin, Craig; Johnson, Matthew P.; Kourkoutis, Lena F.; Hunter, C. Neil
2016-01-01
The cellular process responsible for providing energy for most life on Earth, namely photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. We present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. We describe the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers. PMID:27274603
Computational Astrophysics Consortium, University of Minnesota, Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heger, Alexander
During its six year duration the Computational Astrophysics consortium helped to train the next generation of scientists in computational and nuclear astrophysics. A total of five graduate students were supported by the grant at UMN. The major advances at UMN were in the use, testing, and contribution to development of the CASTRO that efficiently scales on over 100,000 CPUs. At UMN it was used for modeling of thermonuclear supernovae (pair instability and supermassive stars) and core-collapse supernovae as well as the final phases of their progenitors, as well as for x-ray bursts from accreting neutron stars. Important secondary advances inmore » the field of nuclear astrophysics included a better understanding of the evolution of massive stars and the origin of the elements. The research resulted in more than 50 publications.« less
ERIC Educational Resources Information Center
Wheeler, David L.
1988-01-01
Scientists feel that progress in artificial intelligence and the availability of thousands of experimental results make this the right time to build and test theories on how people think and learn, using the computer to model minds. (MSE)
Challenges of Big Data Analysis.
Fan, Jianqing; Han, Fang; Liu, Han
2014-06-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Challenges of Big Data Analysis
Fan, Jianqing; Han, Fang; Liu, Han
2014-01-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469
PRACE - The European HPC Infrastructure
NASA Astrophysics Data System (ADS)
Stadelmeyer, Peter
2014-05-01
The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. This talk gives a general overview about PRACE and the PRACE research infrastructure (RI). PRACE is established as an international not-for-profit association and the PRACE RI is a pan-European supercomputing infrastructure which offers access to computing and data management resources at partner sites distributed throughout Europe. Besides a short summary about the organization, history, and activities of PRACE, it is explained how scientists and researchers from academia and industry from around the world can access PRACE systems and which education and training activities are offered by PRACE. The overview also contains a selection of PRACE contributions to societal challenges and ongoing activities. Examples of the latter are beside others petascaling, application benchmark suite, best practice guides for efficient use of key architectures, application enabling / scaling, new programming models, and industrial applications. The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-261557, RI-283493 and RI-312763. For more information, see www.prace-ri.eu
"Ask Argonne" - Charlie Catlett, Computer Scientist, Part 2
Catlett, Charlie
2018-02-14
A few weeks back, computer scientist Charlie Catlett talked a bit about the work he does and invited questions from the public during Part 1 of his "Ask Argonne" video set (http://bit.ly/1joBtzk). In Part 2, he answers some of the questions that were submitted. Enjoy!
"Ask Argonne" - Charlie Catlett, Computer Scientist, Part 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Catlett, Charlie
2014-06-17
A few weeks back, computer scientist Charlie Catlett talked a bit about the work he does and invited questions from the public during Part 1 of his "Ask Argonne" video set (http://bit.ly/1joBtzk). In Part 2, he answers some of the questions that were submitted. Enjoy!
Statistical regularities in the rank-citation profile of scientists
Petersen, Alexander M.; Stanley, H. Eugene; Succi, Sauro
2011-01-01
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile ci(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each ci(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different ci(r) profiles, our results demonstrate the utility of the βi scaling parameter in conjunction with hi for quantifying individual publication impact. We show that the total number of citations Ci tallied from a scientist's Ni papers scales as . Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress. PMID:22355696
ERIC Educational Resources Information Center
Her Many Horses, Ian
2016-01-01
The world, and especially our own country, is in dire need of a larger and more diverse population of computer scientists. While many organizations have approached this problem of too few computer scientists in various ways, a promising, and I believe necessary, path is to expose elementary students to authentic practices of the discipline.…
Development of a Web Based Simulating System for Earthquake Modeling on the Grid
NASA Astrophysics Data System (ADS)
Seber, D.; Youn, C.; Kaiser, T.
2007-12-01
Existing cyberinfrastructure-based information, data and computational networks now allow development of state- of-the-art, user-friendly simulation environments that democratize access to high-end computational environments and provide new research opportunities for many research and educational communities. Within the Geosciences cyberinfrastructure network, GEON, we have developed the SYNSEIS (SYNthetic SEISmogram) toolkit to enable efficient computations of 2D and 3D seismic waveforms for a variety of research purposes especially for helping to analyze the EarthScope's USArray seismic data in a speedy and efficient environment. The underlying simulation software in SYNSEIS is a finite difference code, E3D, developed by LLNL (S. Larsen). The code is embedded within the SYNSEIS portlet environment and it is used by our toolkit to simulate seismic waveforms of earthquakes at regional distances (<1000km). Architecturally, SYNSEIS uses both Web Service and Grid computing resources in a portal-based work environment and has a built in access mechanism to connect to national supercomputer centers as well as to a dedicated, small-scale compute cluster for its runs. Even though Grid computing is well-established in many computing communities, its use among domain scientists still is not trivial because of multiple levels of complexities encountered. We grid-enabled E3D using our own dialect XML inputs that include geological models that are accessible through standard Web services within the GEON network. The XML inputs for this application contain structural geometries, source parameters, seismic velocity, density, attenuation values, number of time steps to compute, and number of stations. By enabling a portal based access to a such computational environment coupled with its dynamic user interface we enable a large user community to take advantage of such high end calculations in their research and educational activities. Our system can be used to promote an efficient and effective modeling environment to help scientists as well as educators in their daily activities and speed up the scientific discovery process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Tony; Sutherland, James C.
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Saad, Tony; Sutherland, James C.
2016-05-04
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Skills and Knowledge for Data-Intensive Environmental Research
Hampton, Stephanie E.; Jones, Matthew B.; Wasser, Leah A.; Schildhauer, Mark P.; Supp, Sarah R.; Brun, Julien; Hernandez, Rebecca R.; Boettiger, Carl; Collins, Scott L.; Gross, Louis J.; Fernández, Denny S.; Budden, Amber; White, Ethan P.; Teal, Tracy K.; Aukema, Juliann E.
2017-01-01
Abstract The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap. PMID:28584342
Final Technical Report for DOE Award SC0006616
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robertson, Andrew
2015-08-01
This report summarizes research carried out by the project "Collaborative Research, Type 1: Decadal Prediction and Stochastic Simulation of Hydroclimate Over Monsoonal Asia. This collaborative project brought together climate dynamicists (UCLA, IRI), dendroclimatologists (LDEO Tree Ring Laboratory), computer scientists (UCI), and hydrologists (Columbia Water Center, CWC), together with applied scientists in climate risk management (IRI) to create new scientific approaches to quantify and exploit the role of climate variability and change in the growing water crisis across southern and eastern Asia. This project developed new tree-ring based streamflow reconstructions for rivers in monsoonal Asia; improved understanding of hydrologic spatio-temporal modesmore » of variability over monsoonal Asia on interannual-to-centennial time scales; assessed decadal predictability of hydrologic spatio-temporal modes; developed stochastic simulation tools for creating downscaled future climate scenarios based on historical/proxy data and GCM climate change; and developed stochastic reservoir simulation and optimization for scheduling hydropower, irrigation and navigation releases.« less
Skills and Knowledge for Data-Intensive Environmental Research.
Hampton, Stephanie E; Jones, Matthew B; Wasser, Leah A; Schildhauer, Mark P; Supp, Sarah R; Brun, Julien; Hernandez, Rebecca R; Boettiger, Carl; Collins, Scott L; Gross, Louis J; Fernández, Denny S; Budden, Amber; White, Ethan P; Teal, Tracy K; Labou, Stephanie G; Aukema, Juliann E
2017-06-01
The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap.
The Science DMZ: A Network Design Pattern for Data-Intensive Science
Dart, Eli; Rotman, Lauren; Tierney, Brian; ...
2014-01-01
The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The Science DMZ paradigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers andmore » research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.« less
Data Scientists ARE coming of age: but WHERE are they coming from?
NASA Astrophysics Data System (ADS)
Evans, N.; Bastrakova, I.; Connor, N.; Raymond, O.; Wyborn, L. A.
2013-12-01
The fourth paradigm of data intensive science is upon us: a new fundamental scientific methodology has emerged which is underpinned by the capability to analyse large volumes of data using advanced computational capacities. This combination is enabling earth and space scientists to respond to decadal challenges on issues such as the sustainable development of our natural resources, impacts of climate change and protection from national hazards. Fundamental to the data intensive paradigm is data that are readily accessible and capable of being integrated and amalgamated with other data often from multiple sources. For many years Earth and Space science practitioners have been drowning in a data deluge. In many cases, either lacking confidence in their capability and/or not having the time or capacity to manage these data assets they have called in the data professionals. However, such people rarely had domain knowledge of the data they were dealing with and before long it emerged that although the ';containers' of data were now much better managed and documented, in reality the content was locked up and difficult to access, particularly for HPC environments where national to global scale problems were being addressed. Geoscience Australia (GA) is the custodian of over 4 PB of Geoscientific data and is a key provider of evidence-based, scientific advice to government on national issues. Since 2011, in collaboration with CSIRO Minerals Down Under Program, and the National Computational Infrastructure, GA has begun a series of data intensive scientific research pilots that focussed on applying advanced ICT tools and technologies to enhance scientific outcomes for the agency, in particular, national scale analysis of data sets that can be up to 500 TB in size. As in any change program, a small group of innovators and early adopters took up the challenge of data intensive science and quickly showed that GA was able to use new ICT technologies to exploit an information-rich world to undertake applied research and to deliver new business outcomes in ways that current technologies do not allow. The innovators clearly had the necessary skills to rapidly adapt to data intensive techniques. However, if we were to scale out to the rest of the organisation, we needed to quantify these skills. The Strategic People Development Section of GA agreed to: * Conduct a capability analysis of the scientific staff that participated in the pilot projects including a review of university training and post graduate training; and * Conduct capability analysis of the technical groups involved in the pilot projects. The analysis identified the need for multi-disciplinary teams across the spectrum from pure scientists to pure ICT staff along with a key hybrid role - the Data Scientist, who has a greater capacity in mathematical, numerical modelling, statistics, computational skills, software engineering and spatial skills and the ability to integrate data across multiple domains. To fill the emerging gap, GA is asking the questions; how do we find or develop this capability, can we successfully transform the Scientist or the ICT Professional, are our educational facilities modifying their training - but it is certainly leading GA to acknowledge, formalise, and promote a continuum of skills and roles, changing our recruitment, re-assignment and Learning and Development strategic decisions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ayer, Vidya M.; Miguez, Sheila; Toby, Brian H.
Scientists have been central to the historical development of the computer industry, but the importance of software only continues to grow for all areas of scientific research and in particular for powder diffraction. Knowing how to program a computer is a basic and useful skill for scientists. The article introduces the three types of programming languages and why scripting languages are now preferred for scientists. Of them, the authors assert Python is the most useful and easiest to learn. Python is introduced. Also presented is an overview to a few of the many add-on packages available to extend the capabilitiesmore » of Python, for example, for numerical computations, scientific graphics and graphical user interface programming.« less
Das, Abhiram; Schneider, Hannah; Burridge, James; Ascanio, Ana Karine Martinez; Wojciechowski, Tobias; Topp, Christopher N; Lynch, Jonathan P; Weitz, Joshua S; Bucksch, Alexander
2015-01-01
Plant root systems are key drivers of plant function and yield. They are also under-explored targets to meet global food and energy demands. Many new technologies have been developed to characterize crop root system architecture (CRSA). These technologies have the potential to accelerate the progress in understanding the genetic control and environmental response of CRSA. Putting this potential into practice requires new methods and algorithms to analyze CRSA in digital images. Most prior approaches have solely focused on the estimation of root traits from images, yet no integrated platform exists that allows easy and intuitive access to trait extraction and analysis methods from images combined with storage solutions linked to metadata. Automated high-throughput phenotyping methods are increasingly used in laboratory-based efforts to link plant genotype with phenotype, whereas similar field-based studies remain predominantly manual low-throughput. Here, we present an open-source phenomics platform "DIRT", as a means to integrate scalable supercomputing architectures into field experiments and analysis pipelines. DIRT is an online platform that enables researchers to store images of plant roots, measure dicot and monocot root traits under field conditions, and share data and results within collaborative teams and the broader community. The DIRT platform seamlessly connects end-users with large-scale compute "commons" enabling the estimation and analysis of root phenotypes from field experiments of unprecedented size. DIRT is an automated high-throughput computing and collaboration platform for field based crop root phenomics. The platform is accessible at http://www.dirt.iplantcollaborative.org/ and hosted on the iPlant cyber-infrastructure using high-throughput grid computing resources of the Texas Advanced Computing Center (TACC). DIRT is a high volume central depository and high-throughput RSA trait computation platform for plant scientists working on crop roots. It enables scientists to store, manage and share crop root images with metadata and compute RSA traits from thousands of images in parallel. It makes high-throughput RSA trait computation available to the community with just a few button clicks. As such it enables plant scientists to spend more time on science rather than on technology. All stored and computed data is easily accessible to the public and broader scientific community. We hope that easy data accessibility will attract new tool developers and spur creative data usage that may even be applied to other fields of science.
How to Cloud for Earth Scientists: An Introduction
NASA Technical Reports Server (NTRS)
Lynnes, Chris
2018-01-01
This presentation is a tutorial on getting started with cloud computing for the purposes of Earth Observation datasets. We first discuss some of the main advantages that cloud computing can provide for the Earth scientist: copious processing power, immense and affordable data storage, and rapid startup time. We also talk about some of the challenges of getting the most out of cloud computing: re-organizing the way data are analyzed, handling node failures and attending.
NASA Astrophysics Data System (ADS)
Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.
2014-12-01
The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
A Web service-based architecture for real-time hydrologic sensor networks
NASA Astrophysics Data System (ADS)
Wong, B. P.; Zhao, Y.; Kerkez, B.
2014-12-01
Recent advances in web services and cloud computing provide new means by which to process and respond to real-time data. This is particularly true of platforms built for the Internet of Things (IoT). These enterprise-scale platforms have been designed to exploit the IP-connectivity of sensors and actuators, providing a robust means by which to route real-time data feeds and respond to events of interest. While powerful and scalable, these platforms have yet to be adopted by the hydrologic community, where the value of real-time data impacts both scientists and decision makers. We discuss the use of one such IoT platform for the purpose of large-scale hydrologic measurements, showing how rapid deployment and ease-of-use allows scientists to focus on their experiment rather than software development. The platform is hardware agnostic, requiring only IP-connectivity of field devices to capture, store, process, and visualize data in real-time. We demonstrate the benefits of real-time data through a real-world use case by showing how our architecture enables the remote control of sensor nodes, thereby permitting the nodes to adaptively change sampling strategies to capture major hydrologic events of interest.
Mountain hydrology, snow color, and the fourth paradigm
NASA Astrophysics Data System (ADS)
Dozier, Jeff
2011-10-01
The world's mountain ranges accumulate substantial snow, whose melt produces the bulk of runoff and often combines with rain to cause floods. Worldwide, inadequate understanding and a reliance on sparsely distributed observations limit our ability to predict seasonal and paroxysmal runoff as climate changes, ecosystems adapt, populations grow, land use evolves, and societies make choices. To improve assessments of snow accumulation, melt, and runoff, scientists and community planners can take advantage of two emerging trends: (1) an ability to remotely sense snow properties from satellites at a spatial scale appropriate for mountain regions (10- to 100-meter resolution, coverage of the order of 100,000 square kilometers) and a daily temporal scale appropriate for the dynamic nature of snow and (2) The Fourth Paradigm [Hey et al., 2009], which posits a new scientific approach in which insight is discovered through the manipulation of large data sets as the evolutionary step in scientific thinking beyond the first three paradigms: empiricism, analyses, and simulation. The inspiration for the book's title comes from pioneering computer scientist Jim Gray, based on a lecture he gave at the National Academy of Sciences 3 weeks before he disappeared at sea.
A Bayesian method for assessing multiscalespecies-habitat relationships
Stuber, Erica F.; Gruber, Lutz F.; Fontaine, Joseph J.
2017-01-01
ContextScientists face several theoretical and methodological challenges in appropriately describing fundamental wildlife-habitat relationships in models. The spatial scales of habitat relationships are often unknown, and are expected to follow a multi-scale hierarchy. Typical frequentist or information theoretic approaches often suffer under collinearity in multi-scale studies, fail to converge when models are complex or represent an intractable computational burden when candidate model sets are large.ObjectivesOur objective was to implement an automated, Bayesian method for inference on the spatial scales of habitat variables that best predict animal abundance.MethodsWe introduce Bayesian latent indicator scale selection (BLISS), a Bayesian method to select spatial scales of predictors using latent scale indicator variables that are estimated with reversible-jump Markov chain Monte Carlo sampling. BLISS does not suffer from collinearity, and substantially reduces computation time of studies. We present a simulation study to validate our method and apply our method to a case-study of land cover predictors for ring-necked pheasant (Phasianus colchicus) abundance in Nebraska, USA.ResultsOur method returns accurate descriptions of the explanatory power of multiple spatial scales, and unbiased and precise parameter estimates under commonly encountered data limitations including spatial scale autocorrelation, effect size, and sample size. BLISS outperforms commonly used model selection methods including stepwise and AIC, and reduces runtime by 90%.ConclusionsGiven the pervasiveness of scale-dependency in ecology, and the implications of mismatches between the scales of analyses and ecological processes, identifying the spatial scales over which species are integrating habitat information is an important step in understanding species-habitat relationships. BLISS is a widely applicable method for identifying important spatial scales, propagating scale uncertainty, and testing hypotheses of scaling relationships.
Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stone, John E.; Sener, Melih; Vandivort, Kirby L.
The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. In this paper, we present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. Finally, we describemore » the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers.« less
Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stone, John E.; Sener, Melih; Vandivort, Kirby L.
The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. We present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. We describe the techniques that weremore » used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers.« less
Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing
Stone, John E.; Sener, Melih; Vandivort, Kirby L.; ...
2015-12-12
The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. In this paper, we present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. Finally, we describemore » the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1998-10-16
A workshop was held at the RIKEN-BNL Research Center on October 16, 1998, as part of the first anniversary celebration for the center. This meeting brought together the physicists from RIKEN-BNL, BNL and Columbia who are using the QCDSP (Quantum Chromodynamics on Digital Signal Processors) computer at the RIKEN-BNL Research Center for studies of QCD. Many of the talks in the workshop were devoted to domain wall fermions, a discretization of the continuum description of fermions which preserves the global symmetries of the continuum, even at finite lattice spacing. This formulation has been the subject of analytic investigation for somemore » time and has reached the stage where large-scale simulations in QCD seem very promising. With the computational power available from the QCDSP computers, scientists are looking forward to an exciting time for numerical simulations of QCD.« less
Planetary-Scale Geospatial Data Analysis Techniques in Google's Earth Engine Platform (Invited)
NASA Astrophysics Data System (ADS)
Hancher, M.
2013-12-01
Geoscientists have more and more access to new tools for large-scale computing. With any tool, some tasks are easy and other tasks hard. It is natural to look to new computing platforms to increase the scale and efficiency of existing techniques, but there is a more exiting opportunity to discover and develop a new vocabulary of fundamental analysis idioms that are made easy and effective by these new tools. Google's Earth Engine platform is a cloud computing environment for earth data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog includes a nearly complete archive of scenes from Landsat 4, 5, 7, and 8 that have been processed by the USGS, as well as a wide variety of other remotely-sensed and ancillary data products. Earth Engine supports a just-in-time computation model that enables real-time preview during algorithm development and debugging as well as during experimental data analysis and open-ended data exploration. Data processing operations are performed in parallel across many computers in Google's datacenters. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, resampling, and associating image metadata with pixel data. Early applications of Earth Engine have included the development of Google's global cloud-free fifteen-meter base map and global multi-decadal time-lapse animations, as well as numerous large and small experimental analyses by scientists from a range of academic, government, and non-governmental institutions, working in a wide variety of application areas including forestry, agriculture, urban mapping, and species habitat modeling. Patterns in the successes and failures of these early efforts have begun to emerge, sketching the outlines of a new set of simple and effective approaches to geospatial data analysis.
The IT in Secondary Science Book. A Compendium of Ideas for Using Computers and Teaching Science.
ERIC Educational Resources Information Center
Frost, Roger
Scientists need to measure and communicate, to handle information, and model ideas. In essence, they need to process information. Young scientists have the same needs. Computers have become a tremendously important addition to the processing of information through database use, graphing and modeling and also in the collection of information…
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-21
... cruises. A laptop computer is located on the observer platform for ease of data entry. The computer is... lines, the receiving systems will receive the returning acoustic signals. The study (e.g., equipment...-board assistance by the scientists who have proposed the study. The Chief Scientist is Dr. Franco...
GREEN SUPERCOMPUTING IN A DESKTOP BOX
DOE Office of Scientific and Technical Information (OSTI.GOV)
HSU, CHUNG-HSING; FENG, WU-CHUN; CHING, AVERY
2007-01-17
The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicatedmore » computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.« less
SciDAC GSEP: Gyrokinetic Simulation of Energetic Particle Turbulence and Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Zhihong
Energetic particle (EP) confinement is a key physics issue for burning plasma experiment ITER, the crucial next step in the quest for clean and abundant energy, since ignition relies on self-heating by energetic fusion products (α-particles). Due to the strong coupling of EP with burning thermal plasmas, plasma confinement property in the ignition regime is one of the most uncertain factors when extrapolating from existing fusion devices to the ITER tokamak. EP population in current tokamaks are mostly produced by auxiliary heating such as neutral beam injection (NBI) and radio frequency (RF) heating. Remarkable progress in developing comprehensive EP simulationmore » codes and understanding basic EP physics has been made by two concurrent SciDAC EP projects GSEP funded by the Department of Energy (DOE) Office of Fusion Energy Science (OFES), which have successfully established gyrokinetic turbulence simulation as a necessary paradigm shift for studying the EP confinement in burning plasmas. Verification and validation have rapidly advanced through close collaborations between simulation, theory, and experiment. Furthermore, productive collaborations with computational scientists have enabled EP simulation codes to effectively utilize current petascale computers and emerging exascale computers. We review here key physics progress in the GSEP projects regarding verification and validation of gyrokinetic simulations, nonlinear EP physics, EP coupling with thermal plasmas, and reduced EP transport models. Advances in high performance computing through collaborations with computational scientists that enable these large scale electromagnetic simulations are also highlighted. These results have been widely disseminated in numerous peer-reviewed publications including many Phys. Rev. Lett. papers and many invited presentations at prominent fusion conferences such as the biennial International Atomic Energy Agency (IAEA) Fusion Energy Conference and the annual meeting of the American Physics Society, Division of Plasma Physics (APS-DPP).« less
Statistical regularities in the rank-citation profile of scientists.
Petersen, Alexander M; Stanley, H Eugene; Succi, Sauro
2011-01-01
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile c(i)(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c(i)(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different c(i)(r) profiles, our results demonstrate the utility of the β(i) scaling parameter in conjunction with h(i) for quantifying individual publication impact. We show that the total number of citations C(i) tallied from a scientist's N(i) papers scales as [Formula: see text]. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
What do computer scientists tweet? Analyzing the link-sharing practice on Twitter
Schmitt, Marco
2017-01-01
Twitter communication has permeated every sphere of society. To highlight and share small pieces of information with possibly vast audiences or small circles of the interested has some value in almost any aspect of social life. But what is the value exactly for a scientific field? We perform a comprehensive study of computer scientists using Twitter and their tweeting behavior concerning the sharing of web links. Discerning the domains, hosts and individual web pages being tweeted and the differences between computer scientists and a Twitter sample enables us to look in depth at the Twitter-based information sharing practices of a scientific community. Additionally, we aim at providing a deeper understanding of the role and impact of altmetrics in computer science and give a glance at the publications mentioned on Twitter that are most relevant for the computer science community. Our results show a link sharing culture that concentrates more heavily on public and professional quality information than the Twitter sample does. The results also show a broad variety in linked sources and especially in linked publications with some publications clearly related to community-specific interests of computer scientists, while others with a strong relation to attention mechanisms in social media. This refers to the observation that Twitter is a hybrid form of social media between an information service and a social network service. Overall the computer scientists’ style of usage seems to be more on the information-oriented side and to some degree also on professional usage. Therefore, altmetrics are of considerable use in analyzing computer science. PMID:28636619
NASA Astrophysics Data System (ADS)
Wright, D. J.
2013-12-01
In the early 1990s the author came of age as the technology driving the geographic information system or GIS was beginning to successfully 'handle' geospatial data at a range of scales and formats, and a wide array of information technology products emerged from an expanding GIS industry. However, that small community struggled to reflect the diverse research efforts at play in understanding the deeper issues surrounding geospatial data, and the impediments to that effective use of that data. It was from this need that geographic information science or GIScience arose, to ensure in part that GIS did not fall into the trap of being a technology in search of applications, a one-time, one-off, non-intellectual 'bag of tricks' with no substantive theory underpinning it, and suitable only for a static period of time (e.g., Goodchild, 1992). The community has since debated the issue of "tool versus science' which has also played a role in defining GIS as an actual profession. In turn, GIS has contributed to "methodological versus substantive" questions in science, leading to understandings of how the Earth works versus how the Earth should look. In the author's experience, the multidimensional structuring and scaling data, with integrative and innovative approaches to analyzing, modeling, and developing extensive and spatial data from selected places on land and at sea, have revealed how theory and application are in no way mutually exclusive, and it may often be application that advances theory, rather than vice versa. Increasingly, both the system and science of geographic information have welcomed strong collaborations among computer scientists, information scientists, and domain scientists to solve complex scientific questions. As such, they have paralleled the emergence and acceptance of "data science." And now that we are squarely in an era of regional- to global-scale observation and simulation of the Earth, produce data that are too big, move too fast, and do not fit the structures and processing capacity of conventional database systems, and the author reflects on how the potential of the GIS/GIScience world to contribute to the training and professional advancement of data science.
NASA Technical Reports Server (NTRS)
Hickey, J. S.
1983-01-01
The Mesoscale Analysis and Space Sensor (MASS) Data Management and Analysis System developed by Atsuko Computing International (ACI) on the MASS HP-1000 Computer System within the Systems Dynamics Laboratory of the Marshall Space Flight Center is described. The MASS Data Management and Analysis System was successfully implemented and utilized daily by atmospheric scientists to graphically display and analyze large volumes of conventional and satellite derived meteorological data. The scientists can process interactively various atmospheric data (Sounding, Single Level, Gird, and Image) by utilizing the MASS (AVE80) share common data and user inputs, thereby reducing overhead, optimizing execution time, and thus enhancing user flexibility, useability, and understandability of the total system/software capabilities. In addition ACI installed eight APPLE III graphics/imaging computer terminals in individual scientist offices and integrated them into the MASS HP-1000 Computer System thus providing significant enhancement to the overall research environment.
NASA Technical Reports Server (NTRS)
Pinelli, Thomas E.; Kennedy, John M.; Barclay, Rebecca O.; Bishop, Ann P.
1992-01-01
To remain a world leader in aerospace, the US must improve and maintain the professional competency of its engineers and scientists, increase the research and development (R&D) knowledge base, improve productivity, and maximize the integration of recent technological developments into the R&D process. How well these objectives are met, and at what cost, depends on a variety of factors, but largely on the ability of US aerospace engineers and scientists to acquire and process the results of federally funded R&D. The Federal Government's commitment to high speed computing and networking systems presupposes that computer and information technology will play a major role in the aerospace knowledge diffusion process. However, we know little about information technology needs, uses, and problems within the aerospace knowledge diffusion process. The use of computer and information technology by US aerospace engineers and scientists in academia, government, and industry is reported.
Introduction to the Space Physics Analysis Network (SPAN)
NASA Technical Reports Server (NTRS)
Green, J. L. (Editor); Peters, D. J. (Editor)
1985-01-01
The Space Physics Analysis Network or SPAN is emerging as a viable method for solving an immediate communication problem for the space scientist. SPAN provides low-rate communication capability with co-investigators and colleagues, and access to space science data bases and computational facilities. The SPAN utilizes up-to-date hardware and software for computer-to-computer communications allowing binary file transfer and remote log-on capability to over 25 nationwide space science computer systems. SPAN is not discipline or mission dependent with participation from scientists in such fields as magnetospheric, ionospheric, planetary, and solar physics. Basic information on the network and its use are provided. It is anticipated that SPAN will grow rapidly over the next few years, not only from the standpoint of more network nodes, but as scientists become more proficient in the use of telescience, more capability will be needed to satisfy the demands.
Cloudbursting - Solving the 3-body problem
NASA Astrophysics Data System (ADS)
Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.
2014-12-01
Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.
Massive Cloud-Based Big Data Processing for Ocean Sensor Networks and Remote Sensing
NASA Astrophysics Data System (ADS)
Schwehr, K. D.
2017-12-01
Until recently, the work required to integrate and analyze data for global-scale environmental issues was prohibitive both in cost and availability. Traditional desktop processing systems are not able to effectively store and process all the data, and super computer solutions are financially out of the reach of most people. The availability of large-scale cloud computing has created tools that are usable by small groups and individuals regardless of financial resources or locally available computational resources. These systems give scientists and policymakers the ability to see how critical resources are being used across the globe with little or no barrier to entry. Google Earth Engine has the Moderate Resolution Imaging Spectroradiometer (MODIS) Terra, MODIS Aqua, and Global Land Data Assimilation Systems (GLDAS) data catalogs available live online. Here we demonstrate these data to calculate the correlation between lagged chlorophyll and rainfall to identify areas of eutrophication, matching these events to ocean currents from datasets like HYbrid Coordinate Ocean Model (HYCOM) to check if there are constraints from oceanographic configurations. The system can provide addition ground truth with observations from sensor networks like the International Comprehensive Ocean-Atmosphere Data Set / Voluntary Observing Ship (ICOADS/VOS) and Argo floats. This presentation is intended to introduce users to the datasets, programming idioms, and functionality of Earth Engine for large-scale, data-driven oceanography.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacquelin, Mathias; De Jong, Wibe A.; Bylaska, Eric J.
2017-07-03
The Ab Initio Molecular Dynamics (AIMD) method allows scientists to treat the dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. This extremely important method has tremendous computational requirements, because the electronic Schr¨odinger equation, approximated using Kohn-Sham Density Functional Theory (DFT), is solved at every time step. With the advent of manycore architectures, application developers have a significant amount of processing power within each compute node that can only be exploited through massive parallelism. A compute intensive application such as AIMD forms a good candidate to leverage this processing power. In this paper, wemore » focus on adding thread level parallelism to the plane wave DFT methodology implemented in NWChem. Through a careful optimization of tall-skinny matrix products, which are at the heart of the Lagrange multiplier and nonlocal pseudopotential kernels, as well as 3D FFTs, our OpenMP implementation delivers excellent strong scaling on the latest Intel Knights Landing (KNL) processor. We assess the efficiency of our Lagrange multiplier kernels by building a Roofline model of the platform, and verify that our implementation is close to the roofline for various problem sizes. Finally, we present strong scaling results on the complete AIMD simulation for a 64 water molecules test case, that scales up to all 68 cores of the Knights Landing processor.« less
The SERGISAI procedure for seismic risk assessment
NASA Astrophysics Data System (ADS)
Zonno, G.; Garcia-Fernandez, M.; Jimenez, M.J.; Menoni, S.; Meroni, F.; Petrini, V.
The European project SERGISAI developed a computational tool where amethodology for seismic risk assessment at different geographical scales hasbeen implemented. Experts of various disciplines, including seismologists,engineers, planners, geologists, and computer scientists, co-operated in anactual multidisciplinary process to develop this tool. Standard proceduralcodes, Geographical Information Systems (GIS), and Artificial Intelligence(AI) techniques compose the whole system, that will enable the end userto carry out a complete seismic risk assessment at three geographical scales:regional, sub-regional and local. At present, single codes or models thathave been incorporated are not new in general, but the modularity of theprototype, based on a user-friendly front-end, offers potential users thepossibility of updating or replacing any code or model if desired. Theproposed procedure is a first attempt to integrate tools, codes and methodsfor assessing expected earthquake damage, and it was mainly designedto become a useful support for civil defence and land use planning agencies.Risk factors have been treated in the most suitable way for each one, interms of level of detail, kind of parameters and units of measure.Identifying various geographical scales is not a mere question of dimension;since entities to be studied correspond to areas defined by administrativeand geographical borders. The procedure was applied in the following areas:Toscana in Italy, for the regional scale, the Garfagnana area in Toscana, forthe sub-regional scale, and a part of Barcelona city, Spain, for the localscale.
NASA Astrophysics Data System (ADS)
Gorelick, Noel
2013-04-01
The Google Earth Engine platform is a system designed to enable petabyte-scale, scientific analysis and visualization of geospatial datasets. Earth Engine provides a consolidated environment including a massive data catalog co-located with thousands of computers for analysis. The user-friendly front-end provides a workbench environment to allow interactive data and algorithm development and exploration and provides a convenient mechanism for scientists to share data, visualizations and analytic algorithms via URLs. The Earth Engine data catalog contains a wide variety of popular, curated datasets, including the world's largest online collection of Landsat scenes (> 2.0M), numerous MODIS collections, and many vector-based data sets. The platform provides a uniform access mechanism to a variety of data types, independent of their bands, projection, bit-depth, resolution, etc..., facilitating easy multi-sensor analysis. Additionally, a user is able to add and curate their own data and collections. Using a just-in-time, distributed computation model, Earth Engine can rapidly process enormous quantities of geo-spatial data. All computation is performed lazily; nothing is computed until it's required either for output or as input to another step. This model allows real-time feedback and preview during algorithm development, supporting a rapid algorithm development, test, and improvement cycle that scales seamlessly to large-scale production data processing. Through integration with a variety of other services, Earth Engine is able to bring to bear considerable analytic and technical firepower in a transparent fashion, including: AI-based classification via integration with Google's machine learning infrastructure, publishing and distribution at Google scale through integration with the Google Maps API, Maps Engine and Google Earth, and support for in-the-field activities such as validation, ground-truthing, crowd-sourcing and citizen science though the Android Open Data Kit.
NASA Astrophysics Data System (ADS)
Gorelick, N.
2012-12-01
The Google Earth Engine platform is a system designed to enable petabyte-scale, scientific analysis and visualization of geospatial datasets. Earth Engine provides a consolidated environment including a massive data catalog co-located with thousands of computers for analysis. The user-friendly front-end provides a workbench environment to allow interactive data and algorithm development and exploration and provides a convenient mechanism for scientists to share data, visualizations and analytic algorithms via URLs. The Earth Engine data catalog contains a wide variety of popular, curated datasets, including the world's largest online collection of Landsat scenes (> 2.0M), numerous MODIS collections, and many vector-based data sets. The platform provides a uniform access mechanism to a variety of data types, independent of their bands, projection, bit-depth, resolution, etc..., facilitating easy multi-sensor analysis. Additionally, a user is able to add and curate their own data and collections. Using a just-in-time, distributed computation model, Earth Engine can rapidly process enormous quantities of geo-spatial data. All computation is performed lazily; nothing is computed until it's required either for output or as input to another step. This model allows real-time feedback and preview during algorithm development, supporting a rapid algorithm development, test, and improvement cycle that scales seamlessly to large-scale production data processing. Through integration with a variety of other services, Earth Engine is able to bring to bear considerable analytic and technical firepower in a transparent fashion, including: AI-based classification via integration with Google's machine learning infrastructure, publishing and distribution at Google scale through integration with the Google Maps API, Maps Engine and Google Earth, and support for in-the-field activities such as validation, ground-truthing, crowd-sourcing and citizen science though the Android Open Data Kit.
Managing data from multiple disciplines, scales, and sites to support synthesis and modeling
Olson, R. J.; Briggs, J. M.; Porter, J.H.; Mah, Grant R.; Stafford, S.G.
1999-01-01
The synthesis and modeling of ecological processes at multiple spatial and temporal scales involves bringing together and sharing data from numerous sources. This article describes a data and information system model that facilitates assembling, managing, and sharing diverse data from multiple disciplines, scales, and sites to support integrated ecological studies. Cross-site scientific-domain working groups coordinate the development of data associated with their particular scientific working group, including decisions about data requirements, data to be compiled, data formats, derived data products, and schedules across the sites. The Web-based data and information system consists of nodes for each working group plus a central node that provides data access, project information, data query, and other functionality. The approach incorporates scientists and computer experts in the working groups and provides incentives for individuals to submit documented data to the data and information system.
Most Social Scientists Shun Free Use of Supercomputers.
ERIC Educational Resources Information Center
Kiernan, Vincent
1998-01-01
Social scientists, who frequently complain that the federal government spends too little on them, are passing up what scholars in the physical and natural sciences see as the government's best give-aways: free access to supercomputers. Some social scientists say the supercomputers are difficult to use; others find desktop computers provide…
ERIC Educational Resources Information Center
Holbrook, M. Cay; MacCuspie, P. Ann
2010-01-01
Braille-reading mathematicians, scientists, and computer scientists were asked to examine the usability of the Unified English Braille Code (UEB) for technical materials. They had little knowledge of the code prior to the study. The research included two reading tasks, a short tutorial about UEB, and a focus group. The results indicated that the…
Meet EPA Scientist Valerie Zartarian, Ph.D.
Senior exposure scientist and research environmental engineer Valerie Zartarian, Ph.D. helps build computer models and other tools that advance our understanding of how people interact with chemicals.
Hot, Hot, Hot Computer Careers.
ERIC Educational Resources Information Center
Basta, Nicholas
1988-01-01
Discusses the increasing need for electrical, electronic, and computer engineers; and scientists. Provides current status of the computer industry and average salaries. Considers computer chip manufacture and the current chip shortage. (MVL)
Moving image analysis to the cloud: A case study with a genome-scale tomographic study
NASA Astrophysics Data System (ADS)
Mader, Kevin; Stampanoni, Marco
2016-01-01
Over the last decade, the time required to measure a terabyte of microscopic imaging data has gone from years to minutes. This shift has moved many of the challenges away from experimental design and measurement to scalable storage, organization, and analysis. As many scientists and scientific institutions lack training and competencies in these areas, major bottlenecks have arisen and led to substantial delays and gaps between measurement, understanding, and dissemination. We present in this paper a framework for analyzing large 3D datasets using cloud-based computational and storage resources. We demonstrate its applicability by showing the setup and costs associated with the analysis of a genome-scale study of bone microstructure. We then evaluate the relative advantages and disadvantages associated with local versus cloud infrastructures.
NASA Astrophysics Data System (ADS)
Tang, William M., Dr.
2006-01-01
The second annual Scientific Discovery through Advanced Computing (SciDAC) Conference was held from June 25-29, 2006 at the new Hyatt Regency Hotel in Denver, Colorado. This conference showcased outstanding SciDAC-sponsored computational science results achieved during the past year across many scientific domains, with an emphasis on science at scale. Exciting computational science that has been accomplished outside of the SciDAC program both nationally and internationally was also featured to help foster communication between SciDAC computational scientists and those funded by other agencies. This was illustrated by many compelling examples of how domain scientists collaborated productively with applied mathematicians and computer scientists to effectively take advantage of terascale computers (capable of performing trillions of calculations per second) not only to accelerate progress in scientific discovery in a variety of fields but also to show great promise for being able to utilize the exciting petascale capabilities in the near future. The SciDAC program was originally conceived as an interdisciplinary computational science program based on the guiding principle that strong collaborative alliances between domain scientists, applied mathematicians, and computer scientists are vital to accelerated progress and associated discovery on the world's most challenging scientific problems. Associated verification and validation are essential in this successful program, which was funded by the US Department of Energy Office of Science (DOE OS) five years ago. As is made clear in many of the papers in these proceedings, SciDAC has fundamentally changed the way that computational science is now carried out in response to the exciting challenge of making the best use of the rapid progress in the emergence of more and more powerful computational platforms. In this regard, Dr. Raymond Orbach, Energy Undersecretary for Science at the DOE and Director of the OS has stated: `SciDAC has strengthened the role of high-end computing in furthering science. It is defining whole new fields for discovery.' (SciDAC Review, Spring 2006, p8). Application domains within the SciDAC 2006 conference agenda encompassed a broad range of science including: (i) the DOE core mission of energy research involving combustion studies relevant to fuel efficiency and pollution issues faced today and magnetic fusion investigations impacting prospects for future energy sources; (ii) fundamental explorations into the building blocks of matter, ranging from quantum chromodynamics - the basic theory that describes how quarks make up the protons and neutrons of all matter - to the design of modern high-energy accelerators; (iii) the formidable challenges of predicting and controlling the behavior of molecules in quantum chemistry and the complex biomolecules determining the evolution of biological systems; (iv) studies of exploding stars for insights into the nature of the universe; and (v) integrated climate modeling to enable realistic analysis of earth's changing climate. Associated research has made it quite clear that advanced computation is often the only means by which timely progress is feasible when dealing with these complex, multi-component physical, chemical, and biological systems operating over huge ranges of temporal and spatial scales. Working with the domain scientists, applied mathematicians and computer scientists have continued to develop the discretizations of the underlying equations and the complementary algorithms to enable improvements in solutions on modern parallel computing platforms as they evolve from the terascale toward the petascale regime. Moreover, the associated tremendous growth of data generated from the terabyte to the petabyte range demands not only the advanced data analysis and visualization methods to harvest the scientific information but also the development of efficient workflow strategies which can deal with the data input/output, management, movement, and storage challenges. If scientific discovery is expected to keep apace with the continuing progression from tera- to petascale platforms, the vital alliance between domain scientists, applied mathematicians, and computer scientists will be even more crucial. During the SciDAC 2006 Conference, some of the future challenges and opportunities in interdisciplinary computational science were emphasized in the Advanced Architectures Panel and by Dr. Victor Reis, Senior Advisor to the Secretary of Energy, who gave a featured presentation on `Simulation, Computation, and the Global Nuclear Energy Partnership.' Overall, the conference provided an excellent opportunity to highlight the rising importance of computational science in the scientific enterprise and to motivate future investment in this area. As Michael Strayer, SciDAC Program Director, has noted: `While SciDAC may have started out as a specific program, Scientific Discovery through Advanced Computing has become a powerful concept for addressing some of the biggest challenges facing our nation and our world.' Looking forward to next year, the SciDAC 2007 Conference will be held from June 24-28 at the Westin Copley Plaza in Boston, Massachusetts. Chairman: David Keyes, Columbia University. The Organizing Committee for the SciDAC 2006 Conference would like to acknowledge the individuals whose talents and efforts were essential to the success of the meeting. Special thanks go to Betsy Riley for her leadership in building the infrastructure support for the conference, for identifying and then obtaining contributions from our corporate sponsors, for coordinating all media communications, and for her efforts in organizing and preparing the conference proceedings for publication; to Tim Jones for handling the hotel scouting, subcontracts, and exhibits and stage production; to Angela Harris for handling supplies, shipping, and tracking, poster sessions set-up, and for her efforts in coordinating and scheduling the promotional activities that took place during the conference; to John Bui and John Smith for their superb wireless networking and A/V set-up and support; to Cindy Latham for Web site design, graphic design, and quality control of proceedings submissions; and to Pamelia Nixon-Hartje of Ambassador for budget and quality control of catering. We are grateful for the highly professional dedicated efforts of all of these individuals, who were the cornerstones of the SciDAC 2006 Conference. Thanks also go to Angela Beach of the ORNL Conference Center for her efforts in executing the contracts with the hotel, Carolyn James of Colorado State for on-site registration supervision, Lora Wolfe and Brittany Hagen for administrative support at ORNL, and Dami Rich and Andrew Sproles for graphic design and production. We are also most grateful to the Oak Ridge National Laboratory, especially Jeff Nichols, and to our corporate sponsors, Data Direct Networks, Cray, IBM, SGI, and Institute of Physics Publishing for their support. We especially express our gratitude to the featured speakers, invited oral speakers, invited poster presenters, session chairs, and advanced architecture panelists and chair for their excellent contributions on behalf of SciDAC 2006. We would like to express our deep appreciation to Lali Chatterjee, Graham Douglas, Margaret Smith, and the production team of Institute of Physics Publishing, who worked tirelessly to publish the final conference proceedings in a timely manner. Finally, heartfelt thanks are extended to Michael Strayer, Associate Director for OASCR and SciDAC Director, and to the DOE program managers associated with SciDAC for their continuing enthusiasm and strong support for the annual SciDAC Conferences as a special venue to showcase the exciting scientific discovery achievements enabled by the interdisciplinary collaborations championed by the SciDAC program.
International Conference on Intelligent Systems for Molecular Biology (ISMB)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldberg, Debra; Hibbs, Matthew; Kall, Lukas
The Intelligent Systems for Molecular Biology (ISMB) conference has provided a general forum for disseminating the latest developments in bioinformatics on an annual basis for the past 13 years. ISMB is a multidisciplinary conference that brings together scientists from computer science, molecular biology, mathematics and statistics. The goal of the ISMB meeting is to bring together biologists and computational scientists in a focus on actual biological problems, i.e., not simply theoretical calculations. The combined focus on "intelligent systems" and actual biological data makes ISMB a unique and highly important meeting, and 13 years of experience in holding the conference hasmore » resulted in a consistently well organized, well attended, and highly respected annual conference. The ISMB 2005 meeting was held June 25-29, 2005 at the Renaissance Center in Detroit, Michigan. The meeting attracted over 1,730 attendees. The science presented was exceptional, and in the course of the five-day meeting, 56 scientific papers, 710 posters, 47 Oral Abstracts, 76 Software demonstrations, and 14 tutorials were presented. The attendees represented a broad spectrum of backgrounds with 7% from commercial companies, over 28% qualifying for student registration, and 41 countries were represented at the conference, emphasizing its important international aspect. The ISMB conference is especially important because the cultures of computer science and biology are so disparate. ISMB, as a full-scale technical conference with refereed proceedings that have been indexed by both MEDLINE and Current Contents since 1996, bridges this cultural gap.« less
Effects of Relativity Lead to"Warp Speed" Computations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vay, J.-L.
A scientist at Lawrence Berkeley National Laboratory has discovered that a previously unnoticed consequence of Einstein's special theory of relativity can lead to speedup of computer calculations by orders of magnitude when applied to the computer modeling of a certain class of physical systems. This new finding offers the possibility of tackling some problems in a much shorter time and with far more precision than was possible before, as well as studying some configurations in every detail for the first time. The basis of Einstein's theory is the principle of relativity, which states that the laws of physics are themore » same for all observers, whether the 'observer' is a turtle 'racing' with a rabbit, or a beam of particles moving at near light speed. From the invariance of the laws of physics, one may be tempted to infer that the complexity of a system is independent of the motion of the observer, and consequently, a computer simulation will require the same number of mathematical operations, independently of the reference frame that is used for the calculation. Length contraction and time dilation are well known consequences of the special theory of relativity which lead to very counterintuitive effects. An alien observing human activity through a telescope in a spaceship traveling in the Vicinity of the earth near the speed of light would see everything flattened in the direction of propagation of its spaceship (for him, the earth would have the shape of a pancake), while all motions on earth would appear extremely slow, slowed almost to a standstill. Conversely, a space scientist observing the alien through a telescope based on earth would see a flattened alien almost to a standstill in a flattened spaceship. Meanwhile, an astronaut sitting in a spaceship moving at some lower velocity than the alien spaceship with regard to earth might see both the alien spaceship and the earth flattened in the same proportion and the motion unfolding in each of them at the same speed. Let us now assume that each protagonist (the alien, the space scientist and the astronaut) is to run a computer simulation describing the motion of all of them in a single calculation. In order to model a physical system on a computer, scientists often divide space and time into small chunks. Since the computer must calculated some things for each chunk, having a large system containing numerous small chunks translates to long calculations requiring many computational steps on supercomputers. Let us assume that each protagonist of our intergalactic story uses the space and time slicing as described and chooses to perform the calculation in its own frame of reference. For the alien and the space scientist, the slicing of space and time results in an exceedingly large number of chunks, due to the wide disparity of spatial and time scales needed to describe both their own environment and motion together with the other extremely flattened environment and slowed motion. Since the disparity of scales is reduced for the astronaut, who is traveling at an intermediate velocity, the number of computer operations needed to complete the calculation in his frame of reference will be significantly lower, possibly by many orders of magnitude. Analogously, the new discovery at Lawrence Berkeley National Laboratory shows that there exists a frame of reference minimizing the number of computational operations needed for studying the interaction of beams of particles or light (lasers) interacting at, or near, light speed with other particles or with surrounding structures. Speedups ranging from ten to a million times or more are predicted for the modeling of beams interacting with electron clouds, such as those in the upcoming Large Hadron Collider 'atom smasher' accelerator at CERN (Switzerland), and in free electron lasers and tabletop laser wakefield accelerators. The discovery has surprised many physicists and was received initially with much skepticism. It sounded too much like a 'free lunch'. Yet, the demonstration of a speedup of a stunning one thousand times in a test simulation of a particle beam interacting with a background of electrons (see image), has proven that the effect is real and can be applied successfully, at least to some problems. Work is being actively pursued at Berkeley Lab and elsewhere to validate the feasibility of the method for a wider range of applications, as well as to apply the already successful method to more problems, where it might help getting better understanding of some processes and eventually lead to new findings.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shankar, Arjun
Computer scientist Arjun Shankar is director of the Compute and Data Environment for Science (CADES), ORNL’s multidisciplinary big data computing center. CADES offers computing, networking and data analytics to facilitate workflows for both ORNL and external research projects.
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.
2014-12-01
The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Dean N.
2011-07-20
This report summarizes work carried out by the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Team for the period of January 1, 2011 through June 30, 2011. It discusses highlights, overall progress, period goals, and collaborations and lists papers and presentations. To learn more about our project, please visit our UV-CDAT website (URL: http://uv-cdat.org). This report will be forwarded to the program manager for the Department of Energy (DOE) Office of Biological and Environmental Research (BER), national and international collaborators and stakeholders, and to researchers working on a wide range of other climate model, reanalysis, and observation evaluation activities. Themore » UV-CDAT executive committee consists of Dean N. Williams of Lawrence Livermore National Laboratory (LLNL); Dave Bader and Galen Shipman of Oak Ridge National Laboratory (ORNL); Phil Jones and James Ahrens of Los Alamos National Laboratory (LANL), Claudio Silva of Polytechnic Institute of New York University (NYU-Poly); and Berk Geveci of Kitware, Inc. The UV-CDAT team consists of researchers and scientists with diverse domain knowledge whose home institutions also include the National Aeronautics and Space Administration (NASA) and the University of Utah. All work is accomplished under DOE open-source guidelines and in close collaboration with the project's stakeholders, domain researchers, and scientists. Working directly with BER climate science analysis projects, this consortium will develop and deploy data and computational resources useful to a wide variety of stakeholders, including scientists, policymakers, and the general public. Members of this consortium already collaborate with other institutions and universities in researching data discovery, management, visualization, workflow analysis, and provenance. The UV-CDAT team will address the following high-level visualization requirements: (1) Alternative parallel streaming statistics and analysis pipelines - Data parallelism, Task parallelism, Visualization parallelism; (2) Optimized parallel input/output (I/O); (3) Remote interactive execution; (4) Advanced intercomparison visualization; (5) Data provenance processing and capture; and (6) Interfaces for scientists - Workflow data analysis and visualization construction tools, and Visualization interfaces.« less
The Terra Data Fusion Project: An Update
NASA Astrophysics Data System (ADS)
Di Girolamo, L.; Bansal, S.; Butler, M.; Fu, D.; Gao, Y.; Lee, H. J.; Liu, Y.; Lo, Y. L.; Raila, D.; Turner, K.; Towns, J.; Wang, S. W.; Yang, K.; Zhao, G.
2017-12-01
Terra is the flagship of NASA's Earth Observing System. Launched in 1999, Terra's five instruments continue to gather data that enable scientists to address fundamental Earth science questions. By design, the strength of the Terra mission has always been rooted in its five instruments and the ability to fuse the instrument data together for obtaining greater quality of information for Earth Science compared to individual instruments alone. As the data volume grows and the central Earth Science questions move towards problems requiring decadal-scale data records, the need for data fusion and the ability for scientists to perform large-scale analytics with long records have never been greater. The challenge is particularly acute for Terra, given its growing volume of data (> 1 petabyte), the storage of different instrument data at different archive centers, the different file formats and projection systems employed for different instrument data, and the inadequate cyberinfrastructure for scientists to access and process whole-mission fusion data (including Level 1 data). Sharing newly derived Terra products with the rest of the world also poses challenges. As such, the Terra Data Fusion Project aims to resolve two long-standing problems: 1) How do we efficiently generate and deliver Terra data fusion products? 2) How do we facilitate the use of Terra data fusion products by the community in generating new products and knowledge through national computing facilities, and disseminate these new products and knowledge through national data sharing services? Here, we will provide an update on significant progress made in addressing these problems by working with NASA and leveraging national facilities managed by the National Center for Supercomputing Applications (NCSA). The problems that we faced in deriving and delivering Terra L1B2 basic, reprojected and cloud-element fusion products, such as data transfer, data fusion, processing on different computer architectures, science, and sharing, will be presented with quantitative specifics. Results from several science-specific drivers for Terra fusion products will also be presented. We demonstrate that the Terra Data Fusion Project itself provides an excellent use-case for the community addressing Big Data and cyberinfrastructure problems.
Pawlik, Aleksandra; van Gelder, Celia W.G.; Nenadic, Aleksandra; Palagi, Patricia M.; Korpelainen, Eija; Lijnzaad, Philip; Marek, Diana; Sansone, Susanna-Assunta; Hancock, John; Goble, Carole
2017-01-01
Quality training in computational skills for life scientists is essential to allow them to deliver robust, reproducible and cutting-edge research. A pan-European bioinformatics programme, ELIXIR, has adopted a well-established and progressive programme of computational lab and data skills training from Software and Data Carpentry, aimed at increasing the number of skilled life scientists and building a sustainable training community in this field. This article describes the Pilot action, which introduced the Carpentry training model to the ELIXIR community. PMID:28781745
Pawlik, Aleksandra; van Gelder, Celia W G; Nenadic, Aleksandra; Palagi, Patricia M; Korpelainen, Eija; Lijnzaad, Philip; Marek, Diana; Sansone, Susanna-Assunta; Hancock, John; Goble, Carole
2017-01-01
Quality training in computational skills for life scientists is essential to allow them to deliver robust, reproducible and cutting-edge research. A pan-European bioinformatics programme, ELIXIR, has adopted a well-established and progressive programme of computational lab and data skills training from Software and Data Carpentry, aimed at increasing the number of skilled life scientists and building a sustainable training community in this field. This article describes the Pilot action, which introduced the Carpentry training model to the ELIXIR community.
NASA Technical Reports Server (NTRS)
Johnston, William E.; Gannon, Dennis; Nitzberg, Bill
2000-01-01
We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path. Our evaluation consists of amore » cross section of convolutional neural net workloads: CifarNet, CaffeNet, AlexNet and GoogleNet topologies using the Cifar10 and ImageNet datasets. The workloads are vendor optimized for each architecture. GPUs provide the highest overall raw performance. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and KNL can be competitive when considering performance/watt. Furthermore, NVLink is critical to GPU scaling.« less
IDEAL: Images Across Domains, Experiments, Algorithms and Learning
NASA Astrophysics Data System (ADS)
Ushizima, Daniela M.; Bale, Hrishikesh A.; Bethel, E. Wes; Ercius, Peter; Helms, Brett A.; Krishnan, Harinarayan; Grinberg, Lea T.; Haranczyk, Maciej; Macdowell, Alastair A.; Odziomek, Katarzyna; Parkinson, Dilworth Y.; Perciano, Talita; Ritchie, Robert O.; Yang, Chao
2016-11-01
Research across science domains is increasingly reliant on image-centric data. Software tools are in high demand to uncover relevant, but hidden, information in digital images, such as those coming from faster next generation high-throughput imaging platforms. The challenge is to analyze the data torrent generated by the advanced instruments efficiently, and provide insights such as measurements for decision-making. In this paper, we overview work performed by an interdisciplinary team of computational and materials scientists, aimed at designing software applications and coordinating research efforts connecting (1) emerging algorithms for dealing with large and complex datasets; (2) data analysis methods with emphasis in pattern recognition and machine learning; and (3) advances in evolving computer architectures. Engineering tools around these efforts accelerate the analyses of image-based recordings, improve reusability and reproducibility, scale scientific procedures by reducing time between experiments, increase efficiency, and open opportunities for more users of the imaging facilities. This paper describes our algorithms and software tools, showing results across image scales, demonstrating how our framework plays a role in improving image understanding for quality control of existent materials and discovery of new compounds.
NUCLEAR ESPIONAGE: Report Details Spying on Touring Scientists.
Malakoff, D
2000-06-30
A congressional report released this week details dozens of sometimes clumsy attempts by foreign agents to obtain nuclear secrets from U.S. nuclear scientists traveling abroad, ranging from offering scientists prostitutes to prying off the backs of their laptop computers. The report highlights the need to better prepare traveling researchers to safeguard secrets and resist such temptations, say the two lawmakers who requested the report and officials at the Department of Energy, which employs the scientists.
Final Report. Center for Scalable Application Development Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
2014-10-26
The Center for Scalable Application Development Software (CScADS) was established as a part- nership between Rice University, Argonne National Laboratory, University of California Berkeley, University of Tennessee – Knoxville, and University of Wisconsin – Madison. CScADS pursued an integrated set of activities with the aim of increasing the productivity of DOE computational scientists by catalyzing the development of systems software, libraries, compilers, and tools for leadership computing platforms. Principal Center activities were workshops to engage the research community in the challenges of leadership computing, research and development of open-source software, and work with computational scientists to help them develop codesmore » for leadership computing platforms. This final report summarizes CScADS activities at Rice University in these areas.« less
LLNL Scientists Use NERSC to Advance Global Aerosol Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergmann, D J; Chuang, C; Rotman, D
2004-10-13
While ''greenhouse gases'' have been the focus of climate change research for a number of years, DOE's ''Aerosol Initiative'' is now examining how aerosols (small particles of approximately micron size) affect the climate on both a global and regional scale. Scientists in the Atmospheric Science Division at Lawrence Livermore National Laboratory (LLNL) are using NERSC's IBM supercomputer and LLNL's IMPACT (atmospheric chemistry) model to perform simulations showing the historic effects of sulfur aerosols at a finer spatial resolution than ever done before. Simulations were carried out for five decades, from the 1950s through the 1990s. The results clearly show themore » effects of the changing global pattern of sulfur emissions. Whereas in 1950 the United States emitted 41 percent of the world's sulfur aerosols, this figure had dropped to 15 percent by 1990, due to conservation and anti-pollution policies. By contrast, the fraction of total sulfur emissions of European origin has only dropped by a factor of 2 and the Asian emission fraction jumped six fold during the same time, from 7 percent in 1950 to 44 percent in 1990. Under a special allocation of computing time provided by the Office of Science INCITE (Innovative and Novel Computational Impact on Theory and Experiment) program, Dan Bergmann, working with a team of LLNL scientists including Cathy Chuang, Philip Cameron-Smith, and Bala Govindasamy, was able to carry out a large number of calculations during the past month, making the aerosol project one of the largest users of NERSC resources. The applications ran on 128 and 256 processors. The objective was to assess the effects of anthropogenic (man-made) sulfate aerosols. The IMPACT model calculates the rate at which SO{sub 2} (a gas emitted by industrial activity) is oxidized and forms particles known as sulfate aerosols. These particles have a short lifespan in the atmosphere, often washing out in about a week. This means that their effects on climate tend to be more regional, occurring near the area where the SO{sub 2} is emitted. To accurately study these regional effects, Bergmann needed to run the simulations at a finer horizontal resolution, as the coarser resolution (typically 300km by 300km) of other climate models are insufficient for studying changes on a regional scale. Livermore's use of CAM3, the Community Atmospheric Model which is a high-resolution climate model developed at NCAR (with collaboration from DOE), allows a 100km by 100km grid to be applied. NERSC's terascale computing capability provided the needed computational horsepower to run the application at the finer level.« less
Characterization of real-time computers
NASA Technical Reports Server (NTRS)
Shin, K. G.; Krishna, C. M.
1984-01-01
A real-time system consists of a computer controller and controlled processes. Despite the synergistic relationship between these two components, they have been traditionally designed and analyzed independently of and separately from each other; namely, computer controllers by computer scientists/engineers and controlled processes by control scientists. As a remedy for this problem, in this report real-time computers are characterized by performance measures based on computer controller response time that are: (1) congruent to the real-time applications, (2) able to offer an objective comparison of rival computer systems, and (3) experimentally measurable/determinable. These measures, unlike others, provide the real-time computer controller with a natural link to controlled processes. In order to demonstrate their utility and power, these measures are first determined for example controlled processes on the basis of control performance functionals. They are then used for two important real-time multiprocessor design applications - the number-power tradeoff and fault-masking and synchronization.
Ocean Sciences meets Big Data Analytics
NASA Astrophysics Data System (ADS)
Hurwitz, B. L.; Choi, I.; Hartman, J.
2016-02-01
Hundreds of researchers worldwide have joined forces in the Tara Oceans Expedition to create an unprecedented planetary-scale dataset comprised of state-of-the-art next generation sequencing, microscopy, and physical/chemical metadata to explore ocean biodiversity. This summer the complete collection of data from the 2009-2013 Tara voyage was released. Yet, despite herculean efforts by the Tara Oceans Consortium to make raw data and computationally derived assemblies and gene catalogs available, most researchers are stymied by the sheer volume of the data. Specifically, the most tantalizing research questions lie in understanding the unifying principles that guide the distribution of organisms across the sea and affect climate and ecosystem function. To use the data in this capacity researchers must download, integrate, and analyze more than 7.2 trillion bases of metagenomic data and associated metadata from viruses, bacteria, archaea and small eukaryotes at their own data centers ( 9 TB of raw data). Accessing large-scale data sets in this way impedes scientists' from replicating and building on prior work. To this end, we are developing a data platform called the Ocean Cloud Commons (OCC) as part of the iMicrobe project. The OCC is built using an algorithm we developed to pre-compute massive comparative metagenomic analyses in a Hadoop big data framework. By maintaining data in a cloud commons researchers have access to scalable computation and real-time analytics to promote the integrated and broad use of planetary-scale datasets, such as Tara.
From Atmospheric Scientist to Data Scientist
NASA Astrophysics Data System (ADS)
Knuth, S. L.
2015-12-01
Most of my career has been spent analyzing data from research projects in the atmospheric sciences. I spent twelve years researching boundary layer interactions in the polar regions, which included five field seasons in the Antarctic. During this time, I got both a M.S. and Ph.D. in atmospheric science. I learned most of my data science and programming skills throughout this time as part of my research projects. When I graduated with my Ph.D., I was looking for a new and fresh opportunity to enhance the skills I already had while learning more advanced technical skills. I found a position at the University of Colorado Boulder as a Data Research Specialist with Research Computing, a group that provides cyber infrastructure services, including high-speed networking, large-scale data storage, and supercomputing, to university students and researchers. My position is the perfect merriment between advanced technical skills and "softer" skills, while at the same time understanding exactly what the busy scientist needs to understand about their data. I have had the opportunity to help shape our university's data education system, a development that is still evolving. This presentation will detail my career story, the lessons I have learned, my daily work in my new position, and some of the exciting opportunities that opened up in my new career.
Final Report. Institute for Ultralscale Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Kwan-Liu; Galli, Giulia; Gygi, Francois
The SciDAC Institute for Ultrascale Visualization brought together leading experts from visualization, high-performance computing, and science application areas to make advanced visualization solutions for SciDAC scientists and the broader community. Over the five-year project, the Institute introduced many new enabling visualization techniques, which have significantly enhanced scientists’ ability to validate their simulations, interpret their data, and communicate with others about their work and findings. This Institute project involved a large number of junior and student researchers, who received the opportunities to work on some of the most challenging science applications and gain access to the most powerful high-performance computing facilitiesmore » in the world. They were readily trained and prepared for facing the greater challenges presented by extreme-scale computing. The Institute’s outreach efforts, through publications, workshops and tutorials, successfully disseminated the new knowledge and technologies to the SciDAC and the broader scientific communities. The scientific findings and experience of the Institute team helped plan the SciDAC3 program.« less
Large-scale deep learning for robotically gathered imagery for science
NASA Astrophysics Data System (ADS)
Skinner, K.; Johnson-Roberson, M.; Li, J.; Iscar, E.
2016-12-01
With the explosion of computing power, the intelligence and capability of mobile robotics has dramatically increased over the last two decades. Today, we can deploy autonomous robots to achieve observations in a variety of environments ripe for scientific exploration. These platforms are capable of gathering a volume of data previously unimaginable. Additionally, optical cameras, driven by mobile phones and consumer photography, have rapidly improved in size, power consumption, and quality making their deployment cheaper and easier. Finally, in parallel we have seen the rise of large-scale machine learning approaches, particularly deep neural networks (DNNs), increasing the quality of the semantic understanding that can be automatically extracted from optical imagery. In concert this enables new science using a combination of machine learning and robotics. This work will discuss the application of new low-cost high-performance computing approaches and the associated software frameworks to enable scientists to rapidly extract useful science data from millions of robotically gathered images. The automated analysis of imagery on this scale opens up new avenues of inquiry unavailable using more traditional manual or semi-automated approaches. We will use a large archive of millions of benthic images gathered with an autonomous underwater vehicle to demonstrate how these tools enable new scientific questions to be posed.
Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Ourmore » evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.« less
SOURCE EXPLORER: Towards Web Browser Based Tools for Astronomical Source Visualization and Analysis
NASA Astrophysics Data System (ADS)
Young, M. D.; Hayashi, S.; Gopu, A.
2014-05-01
As a new generation of large format, high-resolution imagers come online (ODI, DECAM, LSST, etc.) we are faced with the daunting prospect of astronomical images containing upwards of hundreds of thousands of identifiable sources. Visualizing and interacting with such large datasets using traditional astronomical tools appears to be unfeasible, and a new approach is required. We present here a method for the display and analysis of arbitrarily large source datasets using dynamically scaling levels of detail, enabling scientists to rapidly move from large-scale spatial overviews down to the level of individual sources and everything in-between. Based on the recognized standards of HTML5+JavaScript, we enable observers and archival users to interact with their images and sources from any modern computer without having to install specialized software. We demonstrate the ability to produce large-scale source lists from the images themselves, as well as overlaying data from publicly available source ( 2MASS, GALEX, SDSS, etc.) or user provided source lists. A high-availability cluster of computational nodes allows us to produce these source maps on demand and customized based on user input. User-generated source lists and maps are persistent across sessions and are available for further plotting, analysis, refinement, and culling.
High performance computing and communications: Advancing the frontiers of information technology
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-12-31
This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental inmore » the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fermilab
More than 4,000 scientists in 53 countries use Fermilab and its particle accelerators, detectors and computers for their research. That includes about 2,500 scientists from 223 U.S. institutions in 42 states, plus the District of Columbia and Puerto Rico.
EarthCube: A Community-Driven Cyberinfrastructure for the Geosciences
NASA Astrophysics Data System (ADS)
Koskela, Rebecca; Ramamurthy, Mohan; Pearlman, Jay; Lehnert, Kerstin; Ahern, Tim; Fredericks, Janet; Goring, Simon; Peckham, Scott; Powers, Lindsay; Kamalabdi, Farzad; Rubin, Ken; Yarmey, Lynn
2017-04-01
EarthCube is creating a dynamic, System of Systems (SoS) infrastructure and data tools to collect, access, analyze, share, and visualize all forms of geoscience data and resources, using advanced collaboration, technological, and computational capabilities. EarthCube, as a joint effort between the U.S. National Science Foundation Directorate for Geosciences and the Division of Advanced Cyberinfrastructure, is a quickly growing community of scientists across all geoscience domains, as well as geoinformatics researchers and data scientists. EarthCube has attracted an evolving, dynamic virtual community of more than 2,500 contributors, including earth, ocean, polar, planetary, atmospheric, geospace, computer and social scientists, educators, and data and information professionals. During 2017, EarthCube will transition to the implementation phase. The implementation will balance "innovation" and "production" to advance cross-disciplinary science goals as well as the development of future data scientists. This presentation will describe the current architecture design for the EarthCube cyberinfrastructure and implementation plan.
Citizen Science Data and Scaling
NASA Astrophysics Data System (ADS)
Henderson, S.; Wasser, L. A.
2013-12-01
There is rapid growth in the collection of environmental data by non experts. So called ';citizen scientists' are collecting data on plant phenology, precipitation patterns, bird migration and winter feeding, mating calls of frogs in the spring, and numerous other topics and phenomena related to environmental science. This data is generally submitted to online programs (e.g Project BudBurst, COCORaHS, Project Feederwatch, Frogwatch USA, etc.)and is freely available to scientists, educators, land managers, and decisions makers. While the data is often used to address specific science questions, it also provides the opportunity to explore its utility in the context of ecosystem scaling. Citizen science data is being collected and submitted at an unprecedented rate and is of a spatial and temporal scale previously not possible. The amount of citizen science data vastly exceeds what scientists or land managers can collect on their own. As such, it provides opportunities to address scaling in the environmental sciences. This presentation will explore data from several citizen science programs in the context of scaling.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation
NASA Astrophysics Data System (ADS)
Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.
2014-12-01
Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Volunteer Clouds and Citizen Cyberscience for LHC Physics
NASA Astrophysics Data System (ADS)
Aguado Sanchez, Carlos; Blomer, Jakob; Buncic, Predrag; Chen, Gang; Ellis, John; Garcia Quintas, David; Harutyunyan, Artem; Grey, Francois; Lombrana Gonzalez, Daniel; Marquina, Miguel; Mato, Pere; Rantala, Jarno; Schulz, Holger; Segal, Ben; Sharma, Archana; Skands, Peter; Weir, David; Wu, Jie; Wu, Wenjing; Yadav, Rohit
2011-12-01
Computing for the LHC, and for HEP more generally, is traditionally viewed as requiring specialized infrastructure and software environments, and therefore not compatible with the recent trend in "volunteer computing", where volunteers supply free processing time on ordinary PCs and laptops via standard Internet connections. In this paper, we demonstrate that with the use of virtual machine technology, at least some standard LHC computing tasks can be tackled with volunteer computing resources. Specifically, by presenting volunteer computing resources to HEP scientists as a "volunteer cloud", essentially identical to a Grid or dedicated cluster from a job submission perspective, LHC simulations can be processed effectively. This article outlines both the technical steps required for such a solution and the implications for LHC computing as well as for LHC public outreach and for participation by scientists from developing regions in LHC research.
NASA Technical Reports Server (NTRS)
Vallee, J.; Wilson, T.
1976-01-01
Results are reported of the first experiments for a computer conference management information system at the National Aeronautics and Space Administration. Between August 1975 and March 1976, two NASA projects with geographically separated participants (NASA scientists) used the PLANET computer conferencing system for portions of their work. The first project was a technology assessment of future transportation systems. The second project involved experiments with the Communication Technology Satellite. As part of this project, pre- and postlaunch operations were discussed in a computer conference. These conferences also provided the context for an analysis of the cost of computer conferencing. In particular, six cost components were identified: (1) terminal equipment, (2) communication with a network port, (3) network connection, (4) computer utilization, (5) data storage and (6) administrative overhead.
Know Your Discipline: Teaching the Philosophy of Computer Science
ERIC Educational Resources Information Center
Tedre, Matti
2007-01-01
The diversity and interdisciplinarity of computer science and the multiplicity of its uses in other sciences make it hard to define computer science and to prescribe how computer science should be carried out. The diversity of computer science also causes friction between computer scientists from different branches. Computer science curricula, as…
Communicating the Needs of Climate Change Policy Makers to Scientists
NASA Technical Reports Server (NTRS)
Brown, Molly E.; Escobar, Vanessa M.; Lovell, Heather
2012-01-01
This chapter will describe the challenges that earth scientists face in developing science data products relevant to decision maker and policy needs, and will describe strategies that can improve the two-way communication between the scientist and the policy maker. Climate change policy and decision making happens at a variety of scales - from local government implementing solar homes policies to international negotiations through the United Nations Framework Convention on Climate Change. Scientists can work to provide data at these different scales, but if they are not aware of the needs of decision makers or understand what challenges the policy maker is facing, they are likely to be less successful in influencing policy makers as they wished. This is because the science questions they are addressing may be compelling, but not relevant to the challenges that are at the forefront of policy concerns. In this chapter we examine case studies of science-policy partnerships, and the strategies each partnership uses to engage the scientist at a variety of scales. We examine three case studies: the global Carbon Monitoring System pilot project developed by NASA, a forest biomass mapping effort for Silvacarbon project, and a forest canopy cover project being conducted for forest management in Maryland. In each of these case studies, relationships between scientists and policy makers were critical for ensuring the focus of the science as well as the success of the decision-making.
Computational Earth Science: Big Data Transformed Into Insight
NASA Astrophysics Data System (ADS)
Sellars, Scott; Nguyen, Phu; Chu, Wei; Gao, Xiaogang; Hsu, Kuo-lin; Sorooshian, Soroosh
2013-08-01
More than ever in the history of science, researchers have at their fingertips an unprecedented wealth of data from continuously orbiting satellites, weather monitoring instruments, ecological observatories, seismic stations, moored buoys, floats, and even model simulations and forecasts. With just an internet connection, scientists and engineers can access atmospheric and oceanic gridded data and time series observations, seismographs from around the world, minute-by-minute conditions of the near-Earth space environment, and other data streams that provide information on events across local, regional, and global scales. These data sets have become essential for monitoring and understanding the associated impacts of geological and environmental phenomena on society.
Delivering The Benefits of Chemical-Biological Integration in ...
Abstract: Researchers at the EPA’s National Center for Computational Toxicology integrate advances in biology, chemistry, and computer science to examine the toxicity of chemicals and help prioritize chemicals for further research based on potential human health risks. The intention of this research program is to quickly evaluate thousands of chemicals for potential risk but with much reduced cost relative to historical approaches. This work involves computational and data driven approaches including high-throughput screening, modeling, text-mining and the integration of chemistry, exposure and biological data. We have developed a number of databases and applications that are delivering on the vision of developing a deeper understanding of chemicals and their effects on exposure and biological processes that are supporting a large community of scientists in their research efforts. This presentation will provide an overview of our work to bring together diverse large scale data from the chemical and biological domains, our approaches to integrate and disseminate these data, and the delivery of models supporting computational toxicology. This abstract does not reflect U.S. EPA policy. Presentation at ACS TOXI session on Computational Chemistry and Toxicology in Chemical Discovery and Assessement (QSARs).
Climate Modeling with a Million CPUs
NASA Astrophysics Data System (ADS)
Tobis, M.; Jackson, C. S.
2010-12-01
Michael Tobis, Ph.D. Research Scientist Associate University of Texas Institute for Geophysics Charles S. Jackson Research Scientist University of Texas Institute for Geophysics Meteorological, oceanographic, and climatological applications have been at the forefront of scientific computing since its inception. The trend toward ever larger and more capable computing installations is unabated. However, much of the increase in capacity is accompanied by an increase in parallelism and a concomitant increase in complexity. An increase of at least four additional orders of magnitude in the computational power of scientific platforms is anticipated. It is unclear how individual climate simulations can continue to make effective use of the largest platforms. Conversion of existing community codes to higher resolution, or to more complex phenomenology, or both, presents daunting design and validation challenges. Our alternative approach is to use the expected resources to run very large ensembles of simulations of modest size, rather than to await the emergence of very large simulations. We are already doing this in exploring the parameter space of existing models using the Multiple Very Fast Simulated Annealing algorithm, which was developed for seismic imaging. Our experiments have the dual intentions of tuning the model and identifying ranges of parameter uncertainty. Our approach is less strongly constrained by the dimensionality of the parameter space than are competing methods. Nevertheless, scaling up remains costly. Much could be achieved by increasing the dimensionality of the search and adding complexity to the search algorithms. Such ensemble approaches scale naturally to very large platforms. Extensions of the approach are anticipated. For example, structurally different models can be tuned to comparable effectiveness. This can provide an objective test for which there is no realistic precedent with smaller computations. We find ourselves inventing new code to manage our ensembles. Component computations involve tens to hundreds of CPUs and tens to hundreds of hours. The results of these moderately large parallel jobs influence the scheduling of subsequent jobs, and complex algorithms may be easily contemplated for this. The operating system concept of a "thread" re-emerges at a very coarse level, where each thread manages atomic computations of thousands of CPU-hours. That is, rather than multiple threads operating on a processor, at this level, multiple processors operate within a single thread. In collaboration with the Texas Advanced Computing Center, we are developing a software library at the system level, which should facilitate the development of computations involving complex strategies which invoke large numbers of moderately large multi-processor jobs. While this may have applications in other sciences, our key intent is to better characterize the coupled behavior of a very large set of climate model configurations.
Gómez, Alberto; Nieto-Díaz, Manuel; Del Águila, Ángela; Arias, Enrique
2018-05-01
Transparency in science is increasingly a hot topic. Scientists are required to show not only results but also evidence of how they have achieved these results. In experimental studies of spinal cord injury, there are a number of standardized tests, such as the Basso-Beattie-Bresnahan locomotor rating scale for rats and Basso Mouse Scale for mice, which researchers use to study the pathophysiology of spinal cord injury and to evaluate the effects of experimental therapies. Although the standardized data from the Basso-Beattie-Bresnahan locomotor rating scale and the Basso Mouse Scale are particularly suited for storage and sharing in databases, systems of data acquisition and repositories are still lacking. To the best of our knowledge, both tests are usually conducted manually, with the data being recorded on a paper form, which may be documented with video recordings, before the data is transferred to a spreadsheet for analysis. The data thus obtained is used to compute global scores, which is the information that usually appears in publications, with a wealth of information being omitted. This information may be relevant to understand locomotion deficits or recovery, or even important aspects of the treatment effects. Therefore, this paper presents a mobile application to record and share Basso Mouse Scale tests, meeting the following criteria: i) user-friendly; ii) few hardware requirements (only a smartphone or tablet with a camera running under Android Operating System); and iii) based on open source software such as SQLite, XML, Java, Android Studio and Android SDK. The BAMOS app can be downloaded and installed from the Google Market repository and the app code is available at the GitHub repository. The BAMOS app demonstrates that mobile technology constitutes an opportunity to develop tools for aiding spinal cord injury scientists in recording and sharing experimental data. Copyright © 2018 Elsevier Ltd. All rights reserved.
Crops in silico: A community wide multi-scale computational modeling framework of plant canopies
NASA Astrophysics Data System (ADS)
Srinivasan, V.; Christensen, A.; Borkiewic, K.; Yiwen, X.; Ellis, A.; Panneerselvam, B.; Kannan, K.; Shrivastava, S.; Cox, D.; Hart, J.; Marshall-Colon, A.; Long, S.
2016-12-01
Current crop models predict a looming gap between supply and demand for primary foodstuffs over the next 100 years. While significant yield increases were achieved in major food crops during the early years of the green revolution, the current rates of yield increases are insufficient to meet future projected food demand. Furthermore, with projected reduction in arable land, decrease in water availability, and increasing impacts of climate change on future food production, innovative technologies are required to sustainably improve crop yield. To meet these challenges, we are developing Crops in silico (Cis), a biologically informed, multi-scale, computational modeling framework that can facilitate whole plant simulations of crop systems. The Cis framework is capable of linking models of gene networks, protein synthesis, metabolic pathways, physiology, growth, and development in order to investigate crop response to different climate scenarios and resource constraints. This modeling framework will provide the mechanistic details to generate testable hypotheses toward accelerating directed breeding and engineering efforts to increase future food security. A primary objective for building such a framework is to create synergy among an inter-connected community of biologists and modelers to create a realistic virtual plant. This framework advantageously casts the detailed mechanistic understanding of individual plant processes across various scales in a common scalable framework that makes use of current advances in high performance and parallel computing. We are currently designing a user friendly interface that will make this tool equally accessible to biologists and computer scientists. Critically, this framework will provide the community with much needed tools for guiding future crop breeding and engineering, understanding the emergent implications of discoveries at the molecular level for whole plant behavior, and improved prediction of plant and ecosystem responses to the environment.
Benefits of Exchange Between Computer Scientists and Perceptual Scientists: A Panel Discussion
NASA Technical Reports Server (NTRS)
Kaiser, Mary K.; Null, Cynthia H. (Technical Monitor)
1995-01-01
We have established several major goals for this panel: 1) Introduce the computer graphics community to some specific leaders in the use of perceptual psychology relating to computer graphics; 2) Enumerate the major results that are known, and provide a set of resources for finding others; 3) Identify research areas where knowledge of perceptual psychology can help computer system designers improve their systems; and 4) Provide advice to researchers on how they can establish collaborations in their own research programs. We believe this will be a very important panel. In addition to generating lively discussion, we hope to point out some of the fundamental issues that occur at the boundary between computer science and perception, and possibly help researchers avoid some of the common pitfalls.
ERIC Educational Resources Information Center
Kite, Vance; Park, Soonhye
2018-01-01
In 2006 Jeanette Wing, a professor of computer science at Carnegie Mellon University, proposed computational thinking (CT) as a literacy just as important as reading, writing, and mathematics. Wing defined CT as a set of skills and strategies computer scientists use to solve complex, computational problems (Wing 2006). The computer science and…
2016-09-01
Sciences Group 6% 1550s Computer Scientists Group 5% Other 1500s ORSAa, Mathematics, & Statistics Group 3% 1600s Equipment & Facilities Group 4...Employee removal based on misconduct, delinquency , suitability, unsatisfactory performance, or failure to qualify for conversion to a career appointment...average of 10.4% in many areas, but over double the average for the 1550s (Computer Scientists) and other 1500s (ORSA, Mathematics, and Statistics ). Also
Research Institute for Advanced Computer Science: Annual Report October 1998 through September 1999
NASA Technical Reports Server (NTRS)
Leiner, Barry M.; Gross, Anthony R. (Technical Monitor)
1999-01-01
The Research Institute for Advanced Computer Science (RIACS) carries out basic research and technology development in computer science, in support of the National Aeronautics and Space Administration's missions. RIACS is located at the NASA Ames Research Center (ARC). It currently operates under a multiple year grant/cooperative agreement that began on October 1, 1997 and is up for renewal in the year 2002. ARC has been designated NASA's Center of Excellence in Information Technology. In this capacity, ARC is charged with the responsibility to build an Information Technology Research Program that is preeminent within NASA. RIACS serves as a bridge between NASA ARC and the academic community, and RIACS scientists and visitors work in close collaboration with NASA scientists. RIACS has the additional goal of broadening the base of researchers in these areas of importance to the nation's space and aeronautics enterprises. RIACS research focuses on the three cornerstones of information technology research necessary to meet the future challenges of NASA missions: (1) Automated Reasoning for Autonomous Systems. Techniques are being developed enabling spacecraft that will be self-guiding and self-correcting to the extent that they will require little or no human intervention. Such craft will be equipped to independently solve problems as they arise, and fulfill their missions with minimum direction from Earth. (2) Human-Centered Computing. Many NASA missions require synergy between humans and computers, with sophisticated computational aids amplifying human cognitive and perceptual abilities; (3) High Performance Computing and Networking Advances in the performance of computing and networking continue to have major impact on a variety of NASA endeavors, ranging from modeling and simulation to data analysis of large datasets to collaborative engineering, planning and execution. In addition, RIACS collaborates with NASA scientists to apply information technology research to a variety of NASA application domains. RIACS also engages in other activities, such as workshops, seminars, and visiting scientist programs, designed to encourage and facilitate collaboration between the university and NASA information technology research communities.
Research Institute for Advanced Computer Science
NASA Technical Reports Server (NTRS)
Gross, Anthony R. (Technical Monitor); Leiner, Barry M.
2000-01-01
The Research Institute for Advanced Computer Science (RIACS) carries out basic research and technology development in computer science, in support of the National Aeronautics and Space Administration's missions. RIACS is located at the NASA Ames Research Center. It currently operates under a multiple year grant/cooperative agreement that began on October 1, 1997 and is up for renewal in the year 2002. Ames has been designated NASA's Center of Excellence in Information Technology. In this capacity, Ames is charged with the responsibility to build an Information Technology Research Program that is preeminent within NASA. RIACS serves as a bridge between NASA Ames and the academic community, and RIACS scientists and visitors work in close collaboration with NASA scientists. RIACS has the additional goal of broadening the base of researchers in these areas of importance to the nation's space and aeronautics enterprises. RIACS research focuses on the three cornerstones of information technology research necessary to meet the future challenges of NASA missions: (1) Automated Reasoning for Autonomous Systems. Techniques are being developed enabling spacecraft that will be self-guiding and self-correcting to the extent that they will require little or no human intervention. Such craft will be equipped to independently solve problems as they arise, and fulfill their missions with minimum direction from Earth; (2) Human-Centered Computing. Many NASA missions require synergy between humans and computers, with sophisticated computational aids amplifying human cognitive and perceptual abilities; (3) High Performance Computing and Networking. Advances in the performance of computing and networking continue to have major impact on a variety of NASA endeavors, ranging from modeling and simulation to data analysis of large datasets to collaborative engineering, planning and execution. In addition, RIACS collaborates with NASA scientists to apply information technology research to a variety of NASA application domains. RIACS also engages in other activities, such as workshops, seminars, and visiting scientist programs, designed to encourage and facilitate collaboration between the university and NASA information technology research communities.
Memory Transmission in Small Groups and Large Networks: An Agent-Based Model.
Luhmann, Christian C; Rajaram, Suparna
2015-12-01
The spread of social influence in large social networks has long been an interest of social scientists. In the domain of memory, collaborative memory experiments have illuminated cognitive mechanisms that allow information to be transmitted between interacting individuals, but these experiments have focused on small-scale social contexts. In the current study, we took a computational approach, circumventing the practical constraints of laboratory paradigms and providing novel results at scales unreachable by laboratory methodologies. Our model embodied theoretical knowledge derived from small-group experiments and replicated foundational results regarding collaborative inhibition and memory convergence in small groups. Ultimately, we investigated large-scale, realistic social networks and found that agents are influenced by the agents with which they interact, but we also found that agents are influenced by nonneighbors (i.e., the neighbors of their neighbors). The similarity between these results and the reports of behavioral transmission in large networks offers a major theoretical insight by linking behavioral transmission to the spread of information. © The Author(s) 2015.
A program for handling map projections of small-scale geospatial raster data
Finn, Michael P.; Steinwand, Daniel R.; Trent, Jason R.; Buehler, Robert A.; Mattli, David M.; Yamamoto, Kristina H.
2012-01-01
Scientists routinely accomplish small-scale geospatial modeling using raster datasets of global extent. Such use often requires the projection of global raster datasets onto a map or the reprojection from a given map projection associated with a dataset. The distortion characteristics of these projection transformations can have significant effects on modeling results. Distortions associated with the reprojection of global data are generally greater than distortions associated with reprojections of larger-scale, localized areas. The accuracy of areas in projected raster datasets of global extent is dependent on spatial resolution. To address these problems of projection and the associated resampling that accompanies it, methods for framing the transformation space, direct point-to-point transformations rather than gridded transformation spaces, a solution to the wrap-around problem, and an approach to alternative resampling methods are presented. The implementations of these methods are provided in an open-source software package called MapImage (or mapIMG, for short), which is designed to function on a variety of computer architectures.
An Integrated High Resolution Hydrometeorological Modeling Testbed using LIS and WRF
NASA Technical Reports Server (NTRS)
Kumar, Sujay V.; Peters-Lidard, Christa D.; Eastman, Joseph L.; Tao, Wei-Kuo
2007-01-01
Scientists have made great strides in modeling physical processes that represent various weather and climate phenomena. Many modeling systems that represent the major earth system components (the atmosphere, land surface, and ocean) have been developed over the years. However, developing advanced Earth system applications that integrates these independently developed modeling systems have remained a daunting task due to limitations in computer hardware and software. Recently, efforts such as the Earth System Modeling Ramework (ESMF) and Assistance for Land Modeling Activities (ALMA) have focused on developing standards, guidelines, and computational support for coupling earth system model components. In this article, the development of a coupled land-atmosphere hydrometeorological modeling system that adopts these community interoperability standards, is described. The land component is represented by the Land Information System (LIS), developed by scientists at the NASA Goddard Space Flight Center. The Weather Research and Forecasting (WRF) model, a mesoscale numerical weather prediction system, is used as the atmospheric component. LIS includes several community land surface models that can be executed at spatial scales as fine as 1km. The data management capabilities in LIS enable the direct use of high resolution satellite and observation data for modeling. Similarly, WRF includes several parameterizations and schemes for modeling radiation, microphysics, PBL and other processes. Thus the integrated LIS-WRF system facilitates several multi-model studies of land-atmosphere coupling that can be used to advance earth system studies.
Large-Scale and Global Hydrology. Chapter 92
NASA Technical Reports Server (NTRS)
Rodell, Matthew; Beaudoing, Hiroko Kato; Koster, Randal; Peters-Lidard, Christa D.; Famiglietti, James S.; Lakshmi, Venkat
2016-01-01
Powered by the sun, water moves continuously between and through Earths oceanic, atmospheric, and terrestrial reservoirs. It enables life, shapes Earths surface, and responds to and influences climate change. Scientists measure various features of the water cycle using a combination of ground, airborne, and space-based observations, and seek to characterize it at multiple scales with the aid of numerical models. Over time our understanding of the water cycle and ability to quantify it have improved, owing to advances in observational capabilities, the extension of the data record, and increases in computing power and storage. Here we present some of the most recent estimates of global and continental ocean basin scale water cycle stocks and fluxes and provide examples of modern numerical modeling systems and reanalyses.Further, we discuss prospects for predicting water cycle variability at seasonal and longer scales, which is complicated by a changing climate and direct human impacts related to water management and agriculture. Changes to the water cycle will be among the most obvious and important facets of climate change, thus it is crucial that we continue to invest in our ability to monitor it.
Recent Advances and Issues in Computers. Oryx Frontiers of Science Series.
ERIC Educational Resources Information Center
Gay, Martin K.
Discussing recent issues in computer science, this book contains 11 chapters covering: (1) developments that have the potential for changing the way computers operate, including microprocessors, mass storage systems, and computing environments; (2) the national computational grid for high-bandwidth, high-speed collaboration among scientists, and…
Temporal and Spatio-Temporal Dynamic Instabilities: Novel Computational and Experimental approaches
NASA Astrophysics Data System (ADS)
Doedel, Eusebius J.; Panayotaros, Panayotis; Lambruschini, Carlos L. Pando
2016-11-01
This special issue contains a concise account of significant research results presented at the international workshop on Advanced Computational and Experimental Techniques in Nonlinear Dynamics, which was held in Cusco, Peru in August 2015. The meeting gathered leading experts, as well as new researchers, who have contributed to different aspects of Nonlinear Dynamics. Particularly significant was the presence of many active scientists from Latin America. The topics covered in this special issue range from advanced numerical techniques to novel physical experiments, and reflect the present state of the art in several areas of Nonlinear Dynamics. It contains seven review articles, followed by twenty-one regular papers that are organized in five categories, namely (1) Nonlinear Evolution Equations and Applications, (2) Numerical Continuation in Self-sustained Oscillators, (3) Synchronization, Control and Data Analysis, (4) Hamiltonian Systems, and (5) Scaling Properties in Maps.
Drawing the PDB: Protein-Ligand Complexes in Two Dimensions.
Stierand, Katrin; Rarey, Matthias
2010-12-09
The two-dimensional representation of molecules is a popular communication medium in chemistry and the associated scientific fields. Computational methods for drawing small molecules with and without manual investigation are well-established and widely spread in terms of numerous software tools. Concerning the planar depiction of molecular complexes, there is considerably less choice. We developed the software PoseView, which automatically generates two-dimensional diagrams of macromolecular complexes, showing the ligand, the interactions, and the interacting residues. All depicted molecules are drawn on an atomic level as structure diagrams; thus, the output plots are clearly structured and easily readable for the scientist. We tested the performance of PoseView in a large-scale application on nearly all druglike complexes of the PDB (approximately 200000 complexes); for more than 92% of the complexes considered for drawing, a layout could be computed. In the following, we will present the results of this application study.
Where Next for Marine Cloud Brightening Research?
NASA Astrophysics Data System (ADS)
Jenkins, A. K. L.; Forster, P.
2014-12-01
Realistic estimates of geoengineering effectiveness will be central to informed decision-making on its possible role in addressing climate change. Over the last decade, global-scale computer climate modelling of geoengineering has been developing. While these developments have allowed quantitative estimates of geoengineering effectiveness to be produced, the relative coarseness of the grid of these models (tens of kilometres) means that key practical details of the proposed geoengineering is not always realistically captured. This is particularly true for marine cloud brightening (MCB), where both the clouds, as well as the tens-of-meters scale sea-going implementation vessels cannot be captured in detail. Previous research using cloud resolving modelling has shown that neglecting such details may lead to MCB effectiveness being overestimated by up to half. Realism of MCB effectiveness will likely improve from ongoing developments in the understanding and modelling of clouds. We also propose that realism can be increased via more specific improvements (see figure). A readily achievable example would be the reframing of previous MCB effectiveness estimates in light of the cloud resolving scale findings. Incorporation of implementation details could also be made - via parameterisation - into future global-scale modelling of MCB. However, as significant unknowns regarding the design of the MCB aerosol production technique remain, resource-intensive cloud resolving computer modelling of MCB may be premature unless of broader benefit to the wider understanding of clouds. One of the most essential recommendations is for enhanced communication between climate scientists and MCB designers. This would facilitate the identification of potentially important design aspects necessary for realistic computer simulations. Such relationships could be mutually beneficial, with computer modelling potentially informing more efficient designs of the MCB implementation technique. (Acknowledgment) This work is part of the Integrated Assessment of Geoengineering Proposals (IAGP) project, funded by the Engineering and Physical Sciences Research Council and the Natural Environment Research Council (EP/I014721/1).
ERIC Educational Resources Information Center
Travis, John
1991-01-01
A discipline in which scientists seek to simulate and synthesize lifelike behaviors within computers, chemical mixtures, and other media is discussed. A computer program with self-replicating digital "organisms" that evolve as they compete for computer time and memory is described. (KR)
NASA Technical Reports Server (NTRS)
1987-01-01
Philip Morris research center scientists use a computer program called CECTRP, for Chemical Equilibrium Composition and Transport Properties, to gain insight into the behavior of atoms as they progress along the reaction pathway. Use of the program lets the scientist accurately predict the behavior of a given molecule or group of molecules. Computer generated data must be checked by laboratory experiment, but the use of CECTRP saves the researchers hundreds of hours of laboratory time since experiments must run only to validate the computer's prediction. Philip Morris estimates that had CECTRP not been available, at least two man years would have been required to develop a program to perform similar free energy calculations.
Developing an online programme in computational biology.
Vincent, Heather M; Page, Christopher
2013-11-01
Much has been written about the need for continuing education and training to enable life scientists and computer scientists to manage and exploit the different types of biological data now becoming available. Here we describe the development of an online programme that combines short training courses, so that those who require an educational programme can progress to complete a formal qualification. Although this flexible approach fits the needs of course participants, it does not fit easily within the organizational structures of a campus-based university.
Landsat Science: 40 Years of Innovation and Opportunity
NASA Technical Reports Server (NTRS)
Cook, Bruce D.; Irons, James R.; Masek, Jeffrey G.; Loveland, Thomas R.
2012-01-01
Landsat satellites have provided unparalleled Earth-observing data for nearly 40 years, allowing scientists to describe, monitor and model the global environment during a period of time that has seen dramatic changes in population growth, land use, and climate. The success of the Landsat program can be attributed to well-designed instrument specifications, astute engineering, comprehensive global acquisition and calibration strategies, and innovative scientists who have developed analytical techniques and applications to address a wide range of needs at local to global scales (e.g., crop production, water resource management, human health and environmental quality, urbanization, deforestation and biodiversity). Early Landsat contributions included inventories of natural resources and land cover classification maps, which were initially prepared by a visual interpretation of Landsat imagery. Over time, advances in computer technology facilitated the development of sophisticated image processing algorithms and complex ecosystem modeling, enabling scientists to create accurate, reproducible, and more realistic simulations of biogeochemical processes (e.g., plant production and ecosystem dynamics). Today, the Landsat data archive is freely available for download through the USGS, creating new opportunities for scientists to generate global image datasets, develop new change detection algorithms, and provide products in support of operational programs such as Reducing Emissions from Deforestation and Forest Degradation in Developing Countries (REDD). In particular, the use of dense (approximately annual) time series to characterize both rapid and progressive landscape change has yielded new insights into how the land environment is responding to anthropogenic and natural pressures. The launch of the Landsat Data Continuity Mission (LDCM) satellite in 2012 will continue to propel innovative Landsat science.
System biology of gene regulation.
Baitaluk, Michael
2009-01-01
A famous joke story that exhibits the traditionally awkward alliance between theory and experiment and showing the differences between experimental biologists and theoretical modelers is when a University sends a biologist, a mathematician, a physicist, and a computer scientist to a walking trip in an attempt to stimulate interdisciplinary research. During a break, they watch a cow in a field nearby and the leader of the group asks, "I wonder how one could decide on the size of a cow?" Since a cow is a biological object, the biologist responded first: "I have seen many cows in this area and know it is a big cow." The mathematician argued, "The true volume is determined by integrating the mathematical function that describes the outer surface of the cow's body." The physicist suggested: "Let's assume the cow is a sphere...." Finally the computer scientist became nervous and said that he didn't bring his computer because there is no Internet connection up there on the hill. In this humorous but explanatory story suggestions proposed by theorists can be taken to reflect the view of many experimental biologists that computer scientists and theorists are too far removed from biological reality and therefore their theories and approaches are not of much immediate usefulness. Conversely, the statement of the biologist mirrors the view of many traditional theoretical and computational scientists that biological experiments are for the most part simply descriptive, lack rigor, and that much of the resulting biological data are of questionable functional relevance. One of the goals of current biology as a multidisciplinary science is to bring people from different scientific areas together on the same "hill" and teach them to speak the same "language." In fact, of course, when presenting their data, most experimentalist biologists do provide an interpretation and explanation for the results, and many theorists/computer scientists aim to answer (or at least to fully describe) questions of biological relevance. Thus systems biology could be treated as such a socioscientific phenomenon and a new approach to both experiments and theory that is defined by the strategy of pursuing integration of complex data about the interactions in biological systems from diverse experimental sources using interdisciplinary tools and personnel.
Validating a Scale That Measures Scientists' Self-Efficacy for Public Engagement with Science
ERIC Educational Resources Information Center
Robertson Evia, Jane; Peterman, Karen; Cloyd, Emily; Besley, John
2018-01-01
Self-efficacy, or the beliefs people hold about their ability to succeed in certain pursuits, is a long-established construct. Self-efficacy for science communication distinguishes scientists who engage with the public and relates to scientists' attitudes about the public. As such, self-efficacy for public engagement has the potential to serve as…
Cloud-scale genomic signals processing classification analysis for gene expression microarray data.
Harvey, Benjamin; Soo-Yeon Ji
2014-01-01
As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring inference though analysis of DNA/mRNA sequence data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological inference by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale classification analysis of microarray data using Wavelet thresholding in a Cloud environment to identify significantly expressed features. This paper proposes a novel methodology that uses Wavelet based Denoising to initialize a threshold for determination of significantly expressed genes for classification. Additionally, this research was implemented and encompassed within cloud-based distributed processing environment. The utilization of Cloud computing and Wavelet thresholding was used for the classification 14 tumor classes from the Global Cancer Map (GCM). The results proved to be more accurate than using a predefined p-value for differential expression classification. This novel methodology analyzed Wavelet based threshold features of gene expression in a Cloud environment, furthermore classifying the expression of samples by analyzing gene patterns, which inform us of biological processes. Moreover, enabling researchers to face the present and forthcoming challenges that may arise in the analysis of data in functional genomics of large microarray datasets.
Lipoproteins: When size really matters
German, J. Bruce; Smilowitz, Jennifer T.; Zivkovic, Angela M.
2010-01-01
The field of nanoscience is extending the applications of physics, chemistry and biology into previously unapproached infinitesimal length scales. Understanding the behavior and manipulating the positions and properties of single atoms and molecules hold great potential to improve areas of science as disparate as medicine and computation, and communication and orbiting satellites. Yet, in the race to develop novel, previously unavailable nanoparticles, there is an opportunity for scientists in this field to digress and to apply their growing understanding of nanoscience and the tools of nanotechnology to one of the most pressing problems in all of human biology—diseases related to lipoproteins. Although not appreciated outside the field of lipoprotein biology, variations in the compositions, structures and properties of these nanoscale-sized, blood-borne particles are responsible for most of the variations in health, morbidity and mortality in the Western world. If the lipoproteins could be understood at the nanometer length scale with precise details of their structures and functions, scientists could understand a wide range of perplexing physiological processes and also address the dysfunctions in normal lipoprotein biology that lead to such diseases as hypercholesterolemia, heart disease, stroke and neurodegenerative diseases. Furthermore, if the capabilities of nanoscience to assemble and manipulate nanometer-sized particles could be recruited to studies of lipoproteins, these biological particles would provide a new dimension to therapeutic agents, and these natural particles could be designed to carry out many specialized beneficial tasks. PMID:20592953
Macklin, Paul; Cristini, Vittorio
2013-01-01
Simulating cancer behavior across multiple biological scales in space and time, i.e., multiscale cancer modeling, is increasingly being recognized as a powerful tool to refine hypotheses, focus experiments, and enable more accurate predictions. A growing number of examples illustrate the value of this approach in providing quantitative insight on the initiation, progression, and treatment of cancer. In this review, we introduce the most recent and important multiscale cancer modeling works that have successfully established a mechanistic link between different biological scales. Biophysical, biochemical, and biomechanical factors are considered in these models. We also discuss innovative, cutting-edge modeling methods that are moving predictive multiscale cancer modeling toward clinical application. Furthermore, because the development of multiscale cancer models requires a new level of collaboration among scientists from a variety of fields such as biology, medicine, physics, mathematics, engineering, and computer science, an innovative Web-based infrastructure is needed to support this growing community. PMID:21529163
Widerszal-Bazyl, M; Cieślak, R
2000-01-01
Many studies on the impact of psychosocial working conditions on health prove that psychosocial stress at work is an important risk factor endangering workers' health. Thus it should be constantly monitored like other work hazards. The paper presents a newly developed instrument for stress monitoring called the Psychosocial Working Conditions Questionnaire (PWC). Its structure is based on Robert Karasek's model of job stress (Karasek, 1979; Karasek & Theorell, 1990). It consists of 3 main scales Job Demands, Job Control, Social Support and 2 additional scales adapted from the Occupational Stress Questionnaire (Elo, Leppanen, Lindstrom, & Ropponen, 1992), Well-Being and Desired Changes. The study of 8 occupational groups (bank and insurance specialists, middle medical personnel, construction workers, shop assistants, government and self-government administration officers, computer scientists, public transport drivers, teachers, N = 3,669) indicates that PWC has satisfactory psychometrics parameters. Norms for the 8 groups were developed.
Argonne Out Loud: Computation, Big Data, and the Future of Cities
Catlett, Charlie
2018-01-16
Charlie Catlett, a Senior Computer Scientist at Argonne and Director of the Urban Center for Computation and Data at the Computation Institute of the University of Chicago and Argonne, talks about how he and his colleagues are using high-performance computing, data analytics, and embedded systems to better understand and design cities.
NASA Astrophysics Data System (ADS)
Girvetz, E. H.; Zganjar, C.; Raber, G. T.; Hoekstra, J.; Lawler, J. J.; Kareiva, P.
2008-12-01
Now that there is overwhelming evidence of global climate change, scientists, managers and planners (i.e. practitioners) need to assess the potential impacts of climate change on particular ecological systems, within specific geographic areas, and at spatial scales they care about, in order to make better land management, planning, and policy decisions. Unfortunately, this application of climate science to real world decisions and planning has proceeded too slowly because we lack tools for translating cutting-edge climate science and climate-model outputs into something managers and planners can work with at local or regional scales (CCSP 2008). To help increase the accessibility of climate information, we have developed a freely-available, easy-to-use, web-based climate-change analysis toolbox, called ClimateWizard, for assessing how climate has and is projected to change at specific geographic locations throughout the world. The ClimateWizard uses geographic information systems (GIS), web-services (SOAP/XML), statistical analysis platforms (e.g. R- project), and web-based mapping services (e.g. Google Earth/Maps, KML/GML) to provide a variety of different analyses (e.g. trends and departures) and outputs (e.g. maps, graphs, tables, GIS layers). Because ClimateWizard analyzes large climate datasets stored remotely on powerful computers, users of the tool do not need to have fast computers or expensive software, but simply need access to the internet. The analysis results are then provided to users in a Google Maps webpage tailored to the specific climate-change question being asked. The ClimateWizard is not a static product, but rather a framework to be built upon and modified to suit the purposes of specific scientific, management, and policy questions. For example, it can be expanded to include bioclimatic variables (e.g. evapotranspiration) and marine data (e.g. sea surface temperature), as well as improved future climate projections, and climate-change impact analyses involving hydrology, vegetation, wildfire, disease, and food security. By harnessing the power of computer and web- based technologies, the ClimateWizard puts local, regional, and global climate-change analyses in the hands of a wider array of managers, planners, and scientists.
Australian sea-floor survey data, with images and expert annotations.
Bewley, Michael; Friedman, Ariell; Ferrari, Renata; Hill, Nicole; Hovey, Renae; Barrett, Neville; Marzinelli, Ezequiel M; Pizarro, Oscar; Figueira, Will; Meyer, Lisa; Babcock, Russ; Bellchambers, Lynda; Byrne, Maria; Williams, Stefan B
2015-01-01
This Australian benthic data set (BENTHOZ-2015) consists of an expert-annotated set of georeferenced benthic images and associated sensor data, captured by an autonomous underwater vehicle (AUV) around Australia. This type of data is of interest to marine scientists studying benthic habitats and organisms. AUVs collect georeferenced images over an area with consistent illumination and altitude, and make it possible to generate broad scale, photo-realistic 3D maps. Marine scientists then typically spend several minutes on each of thousands of images, labeling substratum type and biota at a subset of points. Labels from four Australian research groups were combined using the CATAMI classification scheme, a hierarchical classification scheme based on taxonomy and morphology for scoring marine imagery. This data set consists of 407,968 expert labeled points from around the Australian coast, with associated images, geolocation and other sensor data. The robotic surveys that collected this data form part of Australia's Integrated Marine Observing System (IMOS) ongoing benthic monitoring program. There is reuse potential in marine science, robotics, and computer vision research.
Australian sea-floor survey data, with images and expert annotations
Bewley, Michael; Friedman, Ariell; Ferrari, Renata; Hill, Nicole; Hovey, Renae; Barrett, Neville; Pizarro, Oscar; Figueira, Will; Meyer, Lisa; Babcock, Russ; Bellchambers, Lynda; Byrne, Maria; Williams, Stefan B.
2015-01-01
This Australian benthic data set (BENTHOZ-2015) consists of an expert-annotated set of georeferenced benthic images and associated sensor data, captured by an autonomous underwater vehicle (AUV) around Australia. This type of data is of interest to marine scientists studying benthic habitats and organisms. AUVs collect georeferenced images over an area with consistent illumination and altitude, and make it possible to generate broad scale, photo-realistic 3D maps. Marine scientists then typically spend several minutes on each of thousands of images, labeling substratum type and biota at a subset of points. Labels from four Australian research groups were combined using the CATAMI classification scheme, a hierarchical classification scheme based on taxonomy and morphology for scoring marine imagery. This data set consists of 407,968 expert labeled points from around the Australian coast, with associated images, geolocation and other sensor data. The robotic surveys that collected this data form part of Australia's Integrated Marine Observing System (IMOS) ongoing benthic monitoring program. There is reuse potential in marine science, robotics, and computer vision research. PMID:26528396
Australian sea-floor survey data, with images and expert annotations
NASA Astrophysics Data System (ADS)
Bewley, Michael; Friedman, Ariell; Ferrari, Renata; Hill, Nicole; Hovey, Renae; Barrett, Neville; Pizarro, Oscar; Figueira, Will; Meyer, Lisa; Babcock, Russ; Bellchambers, Lynda; Byrne, Maria; Williams, Stefan B.
2015-10-01
This Australian benthic data set (BENTHOZ-2015) consists of an expert-annotated set of georeferenced benthic images and associated sensor data, captured by an autonomous underwater vehicle (AUV) around Australia. This type of data is of interest to marine scientists studying benthic habitats and organisms. AUVs collect georeferenced images over an area with consistent illumination and altitude, and make it possible to generate broad scale, photo-realistic 3D maps. Marine scientists then typically spend several minutes on each of thousands of images, labeling substratum type and biota at a subset of points. Labels from four Australian research groups were combined using the CATAMI classification scheme, a hierarchical classification scheme based on taxonomy and morphology for scoring marine imagery. This data set consists of 407,968 expert labeled points from around the Australian coast, with associated images, geolocation and other sensor data. The robotic surveys that collected this data form part of Australia's Integrated Marine Observing System (IMOS) ongoing benthic monitoring program. There is reuse potential in marine science, robotics, and computer vision research.
NASA Technical Reports Server (NTRS)
Leiner, Barry M.; Gross, Anthony R. (Technical Monitor)
2002-01-01
The Research Institute for Advanced Computer Science (RIACS) carries out basic research and technology development in computer science, in support of the National Aeronautics and Space Administration's missions. Operated by the Universities Space Research Association (a non-profit university consortium), RIACS is located at the NASA Ames Research Center, Moffett Field, California. It currently operates under a multiple year grant/cooperative agreement that began on October 1, 1997 and is up for renewal in September 2003. Ames has been designated NASA's Center of Excellence in Information Technology. In this capacity, Ames is charged with the responsibility to build an Information Technology (IT) Research Program that is preeminent within NASA. RIACS serves as a bridge between NASA Ames and the academic community, and RIACS scientists and visitors work in close collaboration with NASA scientists. RIACS has the additional goal of broadening the base of researchers in these areas of importance to the nation's space and aeronautics enterprises. RIACS research focuses on the three cornerstones of IT research necessary to meet the future challenges of NASA missions: 1) Automated Reasoning for Autonomous Systems; 2) Human-Centered Computing; and 3) High Performance Computing and Networking. In addition, RIACS collaborates with NASA scientists to apply IT research to a variety of NASA application domains including aerospace technology, earth science, life sciences, and astrobiology. RIACS also engages in other activities, such as workshops, seminars, visiting scientist programs and student summer programs, designed to encourage and facilitate collaboration between the university and NASA IT research communities.
NASA Technical Reports Server (NTRS)
1994-01-01
CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.
Computational Thinking: A Digital Age Skill for Everyone
ERIC Educational Resources Information Center
Barr, David; Harrison, John; Conery, Leslie
2011-01-01
In a seminal article published in 2006, Jeanette Wing described computational thinking (CT) as a way of "solving problems, designing systems, and understanding human behavior by drawing on the concepts fundamental to computer science." Wing's article gave rise to an often controversial discussion and debate among computer scientists,…
NASA Technical Reports Server (NTRS)
Klumpar, D. M.; Lapolla, M. V.; Horblit, B.
1995-01-01
A prototype system has been developed to aid the experimental space scientist in the display and analysis of spaceborne data acquired from direct measurement sensors in orbit. We explored the implementation of a rule-based environment for semi-automatic generation of visualizations that assist the domain scientist in exploring one's data. The goal has been to enable rapid generation of visualizations which enhance the scientist's ability to thoroughly mine his data. Transferring the task of visualization generation from the human programmer to the computer produced a rapid prototyping environment for visualizations. The visualization and analysis environment has been tested against a set of data obtained from the Hot Plasma Composition Experiment on the AMPTE/CCE satellite creating new visualizations which provided new insight into the data.
Information processing, computation, and cognition.
Piccinini, Gualtiero; Scarantino, Andrea
2011-01-01
Computation and information processing are among the most fundamental notions in cognitive science. They are also among the most imprecisely discussed. Many cognitive scientists take it for granted that cognition involves computation, information processing, or both - although others disagree vehemently. Yet different cognitive scientists use 'computation' and 'information processing' to mean different things, sometimes without realizing that they do. In addition, computation and information processing are surrounded by several myths; first and foremost, that they are the same thing. In this paper, we address this unsatisfactory state of affairs by presenting a general and theory-neutral account of computation and information processing. We also apply our framework by analyzing the relations between computation and information processing on one hand and classicism, connectionism, and computational neuroscience on the other. We defend the relevance to cognitive science of both computation, at least in a generic sense, and information processing, in three important senses of the term. Our account advances several foundational debates in cognitive science by untangling some of their conceptual knots in a theory-neutral way. By leveling the playing field, we pave the way for the future resolution of the debates' empirical aspects.
Exploring Student and Scientist Experiences in a Novice-Expert Partnership
NASA Astrophysics Data System (ADS)
Bowman, C. D.
2007-12-01
The creation of student-scientist partnership (SSP) programs is one response to the call for greater attention to scientific literacy and science inquiry in schools (COSEPUP, 2006; NRC, 1996; NSTA, 2004). SSPs engage students in authentic scientific investigations as they work alongside scientist mentors engaged in research. The scholarly literature suggests outcomes and benefits to participants in terms of enhanced content learning, as well as gains related to motivation and self-efficacy (Abraham, 2002; Lawless and Rock, 1998; Ledley, Haddad, Lockwood, and Brooks, 2003; Markowitz, 2004; Means, 1998, p. 98; Richmond, 1998). Continuing development of and research into these programs is slow, however, in part because SSPs are resource-intensive (requiring access to scientists and laboratories) and difficult to scale up, creating a perception that they are limited in their application. To begin to reach the goal of scaling up, it is necessary to develop a deep understanding of how each aspect of SSPs contributes to student motivation and learning. To this end, this study provides an in-depth analysis of interviews with the student and scientist members of mentoring dyads that participated in NASA's Athena Student Interns Program associated with the Mars Exploration Rover missions. Crafting a picture of how these students and scientists experienced working closely in a science mentoring dyad contributes to the growing body of work focused on understanding the nature, benefits, and challenges of SSPs and provides potential lessons for SSP practitioners. Considering the participants' insights in the context of career and psychosocial mentoring highlights the complex nature of student-scientist relationships and points to the need to address and encourage both types of mentoring in SSPs in order to foster the most successful partnerships. Such knowledge takes an important step toward informing the development of programs that may introduce greater numbers of students to scientific careers and research, while providing similar benefits as those conferred through small-scale student-scientist collaborations.
Scientists at Work. Final Report.
ERIC Educational Resources Information Center
Education Turnkey Systems, Inc., Falls Church, VA.
This report summarizes activities related to the development, field testing, evaluation, and marketing of the "Scientists at Work" program which combines computer assisted instruction with database tools to aid cognitively impaired middle and early high school children in learning and applying thinking skills to science. The brief report reviews…
Big Software for Big Data: Scaling Up Photometry for LSST (Abstract)
NASA Astrophysics Data System (ADS)
Rawls, M.
2017-06-01
(Abstract only) The Large Synoptic Survey Telescope (LSST) will capture mosaics of the sky every few nights, each containing more data than your computer's hard drive can store. As a result, the software to process these images is as critical to the science as the telescope and the camera. I discuss the algorithms and software being developed by the LSST Data Management team to handle such a large volume of data. All of our work is open source and available to the community. Once LSST comes online, our software will produce catalogs of objects and a stream of alerts. These will bring exciting new opportunities for follow-up observations and collaborations with LSST scientists.
Energy Systems Integration Facility (ESIF): Golden, CO - Energy Integration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheppy, Michael; VanGeet, Otto; Pless, Shanti
2015-03-01
At NREL's Energy Systems Integration Facility (ESIF) in Golden, Colo., scientists and engineers work to overcome challenges related to how the nation generates, delivers and uses energy by modernizing the interplay between energy sources, infrastructure, and data. Test facilities include a megawatt-scale ac electric grid, photovoltaic simulators and a load bank. Additionally, a high performance computing data center (HPCDC) is dedicated to advancing renewable energy and energy efficient technologies. A key design strategy is to use waste heat from the HPCDC to heat parts of the building. The ESIF boasts an annual EUI of 168.3 kBtu/ft2. This article describes themore » building's procurement, design and first year of performance.« less
Bridging Social and Semantic Computing - Design and Evaluation of User Interfaces for Hybrid Systems
ERIC Educational Resources Information Center
Bostandjiev, Svetlin Alex I.
2012-01-01
The evolution of the Web brought new interesting problems to computer scientists that we loosely classify in the fields of social and semantic computing. Social computing is related to two major paradigms: computations carried out by a large amount of people in a collective intelligence fashion (i.e. wikis), and performing computations on social…
Initial Scientific Assessment of the EOS Data and Information System (EOSDIS)
NASA Technical Reports Server (NTRS)
1989-01-01
Crucial to the success of the Earth Observing System (Eos) is the Eos Data and Information System (EosDIS). The goals of Eos depend not only on its instruments and science investigations, but also on how well EosDlS helps scientists integrate reliable, large-scale data sets of geophysical and biological measurements made from Eos data, and on how successfully Eos scientists interact with other investigations in Earth System Science. Current progress in the use of remote sensing for science is hampered by requirements that the scientist understand in detail the instrument, the electromagnetic properties of the surface, and a suite of arcane tape formats, and by the immaturity of some of the techniques for estimating geophysical and biological variables from remote sensing data. These shortcomings must be transcended if remote sensing data are to be used by a much wider population of scientists who study environmental change at regional and global scales.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evans, Thomas; Hamilton, Steven; Slattery, Stuart
Profugus is an open-source mini-application (mini-app) for radiation transport and reactor applications. It contains the fundamental computational kernels used in the Exnihilo code suite from Oak Ridge National Laboratory. However, Exnihilo is production code with a substantial user base. Furthermore, Exnihilo is export controlled. This makes collaboration with computer scientists and computer engineers difficult. Profugus is designed to bridge that gap. By encapsulating the core numerical algorithms in an abbreviated code base that is open-source, computer scientists can analyze the algorithms and easily make code-architectural changes to test performance without compromising the production code values of Exnihilo. Profugus is notmore » meant to be production software with respect to problem analysis. The computational kernels in Profugus are designed to analyze performance, not correctness. Nonetheless, users of Profugus can setup and run problems with enough real-world features to be useful as proof-of-concept for actual production work.« less
Perspectives on an education in computational biology and medicine.
Rubinstein, Jill C
2012-09-01
The mainstream application of massively parallel, high-throughput assays in biomedical research has created a demand for scientists educated in Computational Biology and Bioinformatics (CBB). In response, formalized graduate programs have rapidly evolved over the past decade. Concurrently, there is increasing need for clinicians trained to oversee the responsible translation of CBB research into clinical tools. Physician-scientists with dedicated CBB training can facilitate such translation, positioning themselves at the intersection between computational biomedical research and medicine. This perspective explores key elements of the educational path to such a position, specifically addressing: 1) evolving perceptions of the role of the computational biologist and the impact on training and career opportunities; 2) challenges in and strategies for obtaining the core skill set required of a biomedical researcher in a computational world; and 3) how the combination of CBB with medical training provides a logical foundation for a career in academic medicine and/or biomedical research.
ERIC Educational Resources Information Center
Strober, Myra H.; Arnold, Carolyn L.
This discussion of the impact of new computer occupations on women's employment patterns is divided into four major sections. The first section describes the six computer-related occupations to be analyzed: (1) engineers; (2) computer scientists and systems analysts; (3) programmers; (4) electronic technicians; (5) computer operators; and (6) data…
Enduring Influence of Stereotypical Computer Science Role Models on Women's Academic Aspirations
ERIC Educational Resources Information Center
Cheryan, Sapna; Drury, Benjamin J.; Vichayapai, Marissa
2013-01-01
The current work examines whether a brief exposure to a computer science role model who fits stereotypes of computer scientists has a lasting influence on women's interest in the field. One-hundred undergraduate women who were not computer science majors met a female or male peer role model who embodied computer science stereotypes in appearance…
Collective Computation of Neural Network
1990-03-15
Sciences, Beijing ABSTRACT Computational neuroscience is a new branch of neuroscience originating from current research on the theory of computer...scientists working in artificial intelligence engineering and neuroscience . The paper introduces the collective computational properties of model neural...vision research. On this basis, the authors analyzed the significance of the Hopfield model. Key phrases: Computational Neuroscience , Neural Network, Model
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Xing, Z.; Fetzer, E.
2009-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, perform pairwise instrument matchups for A-Train datasets, and compute fused products. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
Big Data Processing for a Central Texas Groundwater Case Study
NASA Astrophysics Data System (ADS)
Cantu, A.; Rivera, O.; Martínez, A.; Lewis, D. H.; Gentle, J. N., Jr.; Fuentes, G.; Pierce, S. A.
2016-12-01
As computational methods improve, scientists are able to expand the level and scale of experimental simulation and testing that is completed for case studies. This study presents a comparative analysis of multiple models for the Barton Springs segment of the Edwards aquifer. Several numerical simulations using state-mandated MODFLOW models ran on Stampede, a High Performance Computing system housed at the Texas Advanced Computing Center, were performed for multiple scenario testing. One goal of this multidisciplinary project aims to visualize and compare the output data of the groundwater model using the statistical programming language R to find revealing data patterns produced by different pumping scenarios. Presenting data in a friendly post-processing format is covered in this paper. Visualization of the data and creating workflows applicable to the management of the data are tasks performed after data extraction. Resulting analyses provide an example of how supercomputing can be used to accelerate evaluation of scientific uncertainty and geological knowledge in relation to policy and management decisions. Understanding the aquifer behavior helps policy makers avoid negative impact on the endangered species, environmental services and aids in maximizing the aquifer yield.
Optimizing high performance computing workflow for protein functional annotation.
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-09-10
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation
Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene
2014-01-01
Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fermilab
2017-09-01
Scientists, engineers and programmers at Fermilab are tackling today’s most challenging computational problems. Their solutions, motivated by the needs of worldwide research in particle physics and accelerators, help America stay at the forefront of innovation.
Cyber-workstation for computational neuroscience.
Digiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C; Fortes, Jose; Sanchez, Justin C
2010-01-01
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
Cyber-Workstation for Computational Neuroscience
DiGiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C.; Fortes, Jose; Sanchez, Justin C.
2009-01-01
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface. PMID:20126436
An Analysis of Cloud Computing with Amazon Web Services for the Atmospheric Science Data Center
NASA Astrophysics Data System (ADS)
Gleason, J. L.; Little, M. M.
2013-12-01
NASA science and engineering efforts rely heavily on compute and data handling systems. The nature of NASA science data is such that it is not restricted to NASA users, instead it is widely shared across a globally distributed user community including scientists, educators, policy decision makers, and the public. Therefore NASA science computing is a candidate use case for cloud computing where compute resources are outsourced to an external vendor. Amazon Web Services (AWS) is a commercial cloud computing service developed to use excess computing capacity at Amazon, and potentially provides an alternative to costly and potentially underutilized dedicated acquisitions whenever NASA scientists or engineers require additional data processing. AWS desires to provide a simplified avenue for NASA scientists and researchers to share large, complex data sets with external partners and the public. AWS has been extensively used by JPL for a wide range of computing needs and was previously tested on a NASA Agency basis during the Nebula testing program. Its ability to support the Langley Science Directorate needs to be evaluated by integrating it with real world operational needs across NASA and the associated maturity that would come with that. The strengths and weaknesses of this architecture and its ability to support general science and engineering applications has been demonstrated during the previous testing. The Langley Office of the Chief Information Officer in partnership with the Atmospheric Sciences Data Center (ASDC) has established a pilot business interface to utilize AWS cloud computing resources on a organization and project level pay per use model. This poster discusses an effort to evaluate the feasibility of the pilot business interface from a project level perspective by specifically using a processing scenario involving the Clouds and Earth's Radiant Energy System (CERES) project.
OptFuels: Fuel treatment optimization
Greg Jones
2011-01-01
Scientists at the USDA Forest Service, Rocky Mountain Research Station, in Missoula, MT, in collaboration with scientists at the University of Montana, are developing a tool to help forest managers prioritize forest fuel reduction treatments. Although several computer models analyze fuels and fire behavior, stand-level effects of fuel treatments, and priority planning...
Air Force Laboratory’s 2005 Technology Milestones
2006-01-01
Computational materials science methods can benefit the design and property prediction of complex real-world materials. With these models , scientists and...Warfighter Page Air High - Frequency Acoustic System...800) 203-6451 High - Frequency Acoustic System Payoff Scientists created the High - Frequency Acoustic Suppression Technology (HiFAST) airflow control
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
Tatlow, PJ; Piccolo, Stephen R.
2016-01-01
Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9). PMID:27982081
A toolbox and record for scientific models
NASA Technical Reports Server (NTRS)
Ellman, Thomas
1994-01-01
Computational science presents a host of challenges for the field of knowledge-based software design. Scientific computation models are difficult to construct. Models constructed by one scientist are easily misapplied by other scientists to problems for which they are not well-suited. Finally, models constructed by one scientist are difficult for others to modify or extend to handle new types of problems. Construction of scientific models actually involves much more than the mechanics of building a single computational model. In the course of developing a model, a scientist will often test a candidate model against experimental data or against a priori expectations. Test results often lead to revisions of the model and a consequent need for additional testing. During a single model development session, a scientist typically examines a whole series of alternative models, each using different simplifying assumptions or modeling techniques. A useful scientific software design tool must support these aspects of the model development process as well. In particular, it should propose and carry out tests of candidate models. It should analyze test results and identify models and parts of models that must be changed. It should determine what types of changes can potentially cure a given negative test result. It should organize candidate models, test data, and test results into a coherent record of the development process. Finally, it should exploit the development record for two purposes: (1) automatically determining the applicability of a scientific model to a given problem; (2) supporting revision of a scientific model to handle a new type of problem. Existing knowledge-based software design tools must be extended in order to provide these facilities.
Computer Series, 98. Electronics for Scientists: A Computer-Intensive Approach.
ERIC Educational Resources Information Center
Scheeline, Alexander; Mork, Brian J.
1988-01-01
Reports the design for a principles-before-details presentation of electronics for an instrumental analysis class. Uses computers for data collection and simulations. Requires one semester with two 2.5-hour periods and two lectures per week. Includes lab and lecture syllabi. (MVL)
Big data computing: Building a vision for ARS information management
USDA-ARS?s Scientific Manuscript database
Improvements are needed within the ARS to increase scientific capacity and keep pace with new developments in computer technologies that support data acquisition and analysis. Enhancements in computing power and IT infrastructure are needed to provide scientists better access to high performance com...
Science-Driven Computing: NERSC's Plan for 2006-2010
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simon, Horst D.; Kramer, William T.C.; Bailey, David H.
NERSC has developed a five-year strategic plan focusing on three components: Science-Driven Systems, Science-Driven Services, and Science-Driven Analytics. (1) Science-Driven Systems: Balanced introduction of the best new technologies for complete computational systems--computing, storage, networking, visualization and analysis--coupled with the activities necessary to engage vendors in addressing the DOE computational science requirements in their future roadmaps. (2) Science-Driven Services: The entire range of support activities, from high-quality operations and user services to direct scientific support, that enable a broad range of scientists to effectively use NERSC systems in their research. NERSC will concentrate on resources needed to realize the promise ofmore » the new highly scalable architectures for scientific discovery in multidisciplinary computational science projects. (3) Science-Driven Analytics: The architectural and systems enhancements and services required to integrate NERSC's powerful computational and storage resources to provide scientists with new tools to effectively manipulate, visualize, and analyze the huge data sets derived from simulations and experiments.« less
A Distributed Laboratory for Event-Driven Coastal Prediction and Hazard Planning
NASA Astrophysics Data System (ADS)
Bogden, P.; Allen, G.; MacLaren, J.; Creager, G. J.; Flournoy, L.; Sheng, Y. P.; Graber, H.; Graves, S.; Conover, H.; Luettich, R.; Perrie, W.; Ramakrishnan, L.; Reed, D. A.; Wang, H. V.
2006-12-01
The 2005 Atlantic hurricane season was the most active in recorded history. Collectively, 2005 hurricanes caused more than 2,280 deaths and record damages of over 100 billion dollars. Of the storms that made landfall, Dennis, Emily, Katrina, Rita, and Wilma caused most of the destruction. Accurate predictions of storm-driven surge, wave height, and inundation can save lives and help keep recovery costs down, provided the information gets to emergency response managers in time. The information must be available well in advance of landfall so that responders can weigh the costs of unnecessary evacuation against the costs of inadequate preparation. The SURA Coastal Ocean Observing and Prediction (SCOOP) Program is a multi-institution collaboration implementing a modular, distributed service-oriented architecture for real time prediction and visualization of the impacts of extreme atmospheric events. The modular infrastructure enables real-time prediction of multi- scale, multi-model, dynamic, data-driven applications. SURA institutions are working together to create a virtual and distributed laboratory integrating coastal models, simulation data, and observations with computational resources and high speed networks. The loosely coupled architecture allows teams of computer and coastal scientists at multiple institutions to innovate complex system components that are interconnected with relatively stable interfaces. The operational system standardizes at the interface level to enable substantial innovation by complementary communities of coastal and computer scientists. This architectural philosophy solves a long-standing problem associated with the transition from research to operations. The SCOOP Program thereby implements a prototype laboratory consistent with the vision of a national, multi-agency initiative called the Integrated Ocean Observing System (IOOS). Several service- oriented components of the SCOOP enterprise architecture have already been designed and implemented, including data archive and transport services, metadata registry and retrieval (catalog), resource management, and portal interfaces. SCOOP partners are integrating these at the service level and implementing reconfigurable workflows for several kinds of user scenarios, and are working with resource providers to prototype new policies and technologies for on-demand computing.
Arctic Boreal Vulnerability Experiment (ABoVE) Science Cloud
NASA Astrophysics Data System (ADS)
Duffy, D.; Schnase, J. L.; McInerney, M.; Webster, W. P.; Sinno, S.; Thompson, J. H.; Griffith, P. C.; Hoy, E.; Carroll, M.
2014-12-01
The effects of climate change are being revealed at alarming rates in the Arctic and Boreal regions of the planet. NASA's Terrestrial Ecology Program has launched a major field campaign to study these effects over the next 5 to 8 years. The Arctic Boreal Vulnerability Experiment (ABoVE) will challenge scientists to take measurements in the field, study remote observations, and even run models to better understand the impacts of a rapidly changing climate for areas of Alaska and western Canada. The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center (GSFC) has partnered with the Terrestrial Ecology Program to create a science cloud designed for this field campaign - the ABoVE Science Cloud. The cloud combines traditional high performance computing with emerging technologies to create an environment specifically designed for large-scale climate analytics. The ABoVE Science Cloud utilizes (1) virtualized high-speed InfiniBand networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive, science applications. At the center of the architecture is a large object storage environment, much like a traditional high-performance file system, that supports data proximal processing using technologies like MapReduce on a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an InfiniBand network. Virtual systems can be tailored to a specific scientist and provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems. In this talk, we will present the architectural components of the science cloud and examples of how it is being used to meet the needs of the ABoVE campaign. In our experience, the science cloud approach significantly lowers the barriers and risks to organizations that require high performance computing solutions and provides the NCCS with the agility required to meet our customers' rapidly increasing and evolving requirements.
High-Performance Computing Unlocks Innovation at NREL
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Need to fly around a wind farm? Or step inside a molecule? NREL scientists use a super powerful (and highly energy-efficient) computer to visualize and solve big problems in renewable energy research.
Mathematical computer programs: A compilation
NASA Technical Reports Server (NTRS)
1972-01-01
Computer programs, routines, and subroutines for aiding engineers, scientists, and mathematicians in direct problem solving are presented. Also included is a group of items that affords the same users greater flexibility in the use of software.
Performance Engineering Research Institute SciDAC-2 Enabling Technologies Institute Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Mary
2014-09-19
Enhancing the performance of SciDAC applications on petascale systems has high priority within DOE SC. As we look to the future, achieving expected levels of performance on high-end com-puting (HEC) systems is growing ever more challenging due to enormous scale, increasing archi-tectural complexity, and increasing application complexity. To address these challenges, PERI has implemented a unified, tripartite research plan encompassing: (1) performance modeling and prediction; (2) automatic performance tuning; and (3) performance engineering of high profile applications. The PERI performance modeling and prediction activity is developing and refining performance models, significantly reducing the cost of collecting the data upon whichmore » the models are based, and increasing model fidelity, speed and generality. Our primary research activity is automatic tuning (autotuning) of scientific software. This activity is spurred by the strong user preference for automatic tools and is based on previous successful activities such as ATLAS, which has automatically tuned components of the LAPACK linear algebra library, and other re-cent work on autotuning domain-specific libraries. Our third major component is application en-gagement, to which we are devoting approximately 30% of our effort to work directly with Sci-DAC-2 applications. This last activity not only helps DOE scientists meet their near-term per-formance goals, but also helps keep PERI research focused on the real challenges facing DOE computational scientists as they enter the Petascale Era.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nikolic, R J
This month's issue has the following articles: (1) Dawn of a New Era of Scientific Discovery - Commentary by Edward I. Moses; (2) At the Frontiers of Fundamental Science Research - Collaborators from national laboratories, universities, and international organizations are using the National Ignition Facility to probe key fundamental science questions; (3) Livermore Responds to Crisis in Post-Earthquake Japan - More than 70 Laboratory scientists provided round-the-clock expertise in radionuclide analysis and atmospheric dispersion modeling as part of the nation's support to Japan following the March 2011 earthquake and nuclear accident; (4) A Comprehensive Resource for Modeling, Simulation, and Experimentsmore » - A new Web-based resource called MIDAS is a central repository for material properties, experimental data, and computer models; and (5) Finding Data Needles in Gigabit Haystacks - Livermore computer scientists have developed a novel computer architecture based on 'persistent' memory to ease data-intensive computations.« less
Science@SLAC—Discovering New Drugs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drell, Persis; Smith, Clyde; Bushnell, Dave
2011-10-18
SLAC scientists and private-sector drug makers describe how a public--private partnership combined with the specialized X-rays from the Stanford Synchrotron Radiation Lightsource (SSRL) enable smart drug design that eliminates the costly trial-and-error approach used by traditional drug companies. SSRL is a synchrotron lightsource laboratory used by scientists from a range of disciplines to study matter on the scale of atoms and molecules. Featured in this video are SLAC Laboratory Director Persis Drell, SSRL staff scientist Clyde Smith, and Dave Bushnell, a scientist from startup drug maker Cocrystal Discovery Inc.
Science@SLACâDiscovering New Drugs
Drell, Persis; Smith, Clyde; Bushnell, Dave
2018-01-16
SLAC scientists and private-sector drug makers describe how a public--private partnership combined with the specialized X-rays from the Stanford Synchrotron Radiation Lightsource (SSRL) enable smart drug design that eliminates the costly trial-and-error approach used by traditional drug companies. SSRL is a synchrotron lightsource laboratory used by scientists from a range of disciplines to study matter on the scale of atoms and molecules. Featured in this video are SLAC Laboratory Director Persis Drell, SSRL staff scientist Clyde Smith, and Dave Bushnell, a scientist from startup drug maker Cocrystal Discovery Inc.
ERIC Educational Resources Information Center
Shim, Jaekwoun; Kwon, Daiyoung; Lee, Wongyu
2017-01-01
In the past, computer programming was perceived as a task only carried out by computer scientists; in the 21st century, however, computer programming is viewed as a critical and necessary skill that everyone should learn. In order to improve teaching of problem-solving abilities in a computing environment, extensive research is being done on…
Large Scale Computing and Storage Requirements for High Energy Physics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerber, Richard A.; Wasserman, Harvey
2010-11-24
The National Energy Research Scientific Computing Center (NERSC) is the leading scientific computing facility for the Department of Energy's Office of Science, providing high-performance computing (HPC) resources to more than 3,000 researchers working on about 400 projects. NERSC provides large-scale computing resources and, crucially, the support and expertise needed for scientists to make effective use of them. In November 2009, NERSC, DOE's Office of Advanced Scientific Computing Research (ASCR), and DOE's Office of High Energy Physics (HEP) held a workshop to characterize the HPC resources needed at NERSC to support HEP research through the next three to five years. Themore » effort is part of NERSC's legacy of anticipating users needs and deploying resources to meet those demands. The workshop revealed several key points, in addition to achieving its goal of collecting and characterizing computing requirements. The chief findings: (1) Science teams need access to a significant increase in computational resources to meet their research goals; (2) Research teams need to be able to read, write, transfer, store online, archive, analyze, and share huge volumes of data; (3) Science teams need guidance and support to implement their codes on future architectures; and (4) Projects need predictable, rapid turnaround of their computational jobs to meet mission-critical time constraints. This report expands upon these key points and includes others. It also presents a number of case studies as representative of the research conducted within HEP. Workshop participants were asked to codify their requirements in this case study format, summarizing their science goals, methods of solution, current and three-to-five year computing requirements, and software and support needs. Participants were also asked to describe their strategy for computing in the highly parallel, multi-core environment that is expected to dominate HPC architectures over the next few years. The report includes a section that describes efforts already underway or planned at NERSC that address requirements collected at the workshop. NERSC has many initiatives in progress that address key workshop findings and are aligned with NERSC's strategic plans.« less
NASA Astrophysics Data System (ADS)
Agram, P. S.; Gurrola, E. M.; Lavalle, M.; Sacco, G. F.; Rosen, P. A.
2016-12-01
The InSAR Scientific Computing Environment (ISCE) provides both a modular, flexible, and extensible framework for building software components and applications that work together seamlessly as well as a toolbox for processing InSAR data into higher level geodetic image products from a diverse array of radar satellites and aircraft. ISCE easily scales to serve as the SAR processing engine at the core of the NASA JPL Advanced Rapid Imaging and Analysis (ARIA) Center for Natural Hazards as well as a software toolbox for individual scientists working with SAR data. ISCE is planned as the foundational element in processing NISAR data, enabling a new class of analyses that take greater advantage of the long time and large spatial scales of these data. ISCE in ARIA is also a SAR Foundry for development of new processing components and workflows to meet the needs of both large processing centers and individual users. The ISCE framework contains object-oriented Python components layered to construct Python InSAR components that manage legacy Fortran/C InSAR programs. The Python user interface enables both command-line deployment of workflows as well as an interactive "sand box" (the Python interpreter) where scientists can "play" with the data. Recent developments in ISCE include the addition of components to ingest Sentinel-1A SAR data (both stripmap and TOPS-mode) and a new workflow for processing the TOPS-mode data. New components are being developed to exploit polarimetric-SAR data to provide the ecosystem and land-cover/land-use change communities with rigorous and efficient tools to perform multi-temporal, polarimetric and tomographic analyses in order to generate calibrated, geocoded and mosaicked Level-2 and Level-3 products (e.g., maps of above-ground biomass or forest disturbance). ISCE has been downloaded by over 200 users by a license for WinSAR members through the Unavco.org website. Others may apply directly to JPL for a license at download.jpl.nasa.gov.
NASA Technical Reports Server (NTRS)
Moore, Robert C.
1998-01-01
The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities that serves as a bridge between NASA and the academic community. Under a five-year co-operative agreement with NASA, research at RIACS is focused on areas that are strategically enabling to the Ames Research Center's role as NASA's Center of Excellence for Information Technology. Research is carried out by a staff of full-time scientist,augmented by visitors, students, post doctoral candidates and visiting university faculty. The primary mission of RIACS is charted to carry out research and development in computer science. This work is devoted in the main to tasks that are strategically enabling with respect to NASA's bold mission in space exploration and aeronautics. There are three foci for this work: Automated Reasoning. Human-Centered Computing. and High Performance Computing and Networking. RIACS has the additional goal of broadening the base of researcher in these areas of importance to the nation's space and aeronautics enterprises. Through its visiting scientist program, RIACS facilitates the participation of university-based researchers, including both faculty and students, in the research activities of NASA and RIACS. RIACS researchers work in close collaboration with NASA computer scientists on projects such as the Remote Agent Experiment on Deep Space One mission, and Super-Resolution Surface Modeling.
PyMT: A Python package for model-coupling in the Earth sciences
NASA Astrophysics Data System (ADS)
Hutton, E.
2016-12-01
The current landscape of Earth-system models is not only broad in scientific scope, but also broad in type. On the one hand, the large variety of models is exciting, as it provides fertile ground for extending or linking models together in novel ways to answer new scientific questions. However, the heterogeneity in model type acts to inhibit model coupling, model development, or even model use. Existing models are written in a variety of programming languages, operate on different grids, use their own file formats (both for input and output), have different user interfaces, have their own time steps, etc. Each of these factors become obstructions to scientists wanting to couple, extend - or simply run - existing models. For scientists whose main focus may not be computer science these barriers become even larger and become significant logistical hurdles. And this is all before the scientific difficulties of coupling or running models are addressed. The CSDMS Python Modeling Toolkit (PyMT) was developed to help non-computer scientists deal with these sorts of modeling logistics. PyMT is the fundamental package the Community Surface Dynamics Modeling System uses for the coupling of models that expose the Basic Modeling Interface (BMI). It contains: Tools necessary for coupling models of disparate time and space scales (including grid mappers) Time-steppers that coordinate the sequencing of coupled models Exchange of data between BMI-enabled models Wrappers that automatically load BMI-enabled models into the PyMT framework Utilities that support open-source interfaces (UGRID, SGRID,CSDMS Standard Names, etc.) A collection of community-submitted models, written in a variety of programminglanguages, from a variety of process domains - but all usable from within the Python programming language A plug-in framework for adding additional BMI-enabled models to the framework In this presentation we intoduce the basics of the PyMT as well as provide an example of coupling models of different domains and grid types.
Computational chemistry at Janssen
NASA Astrophysics Data System (ADS)
van Vlijmen, Herman; Desjarlais, Renee L.; Mirzadegan, Tara
2017-03-01
Computer-aided drug discovery activities at Janssen are carried out by scientists in the Computational Chemistry group of the Discovery Sciences organization. This perspective gives an overview of the organizational and operational structure, the science, internal and external collaborations, and the impact of the group on Drug Discovery at Janssen.
ERIC Educational Resources Information Center
National Institute of General Medical Sciences (NIGMS), 2009
2009-01-01
Computer advances now let researchers quickly search through DNA sequences to find gene variations that could lead to disease, simulate how flu might spread through one's school, and design three-dimensional animations of molecules that rival any video game. By teaming computers and biology, scientists can answer new and old questions that could…
NASA Astrophysics Data System (ADS)
Bender, Jason D.
Understanding hypersonic aerodynamics is important for the design of next-generation aerospace vehicles for space exploration, national security, and other applications. Ground-level experimental studies of hypersonic flows are difficult and expensive; thus, computational science plays a crucial role in this field. Computational fluid dynamics (CFD) simulations of extremely high-speed flows require models of chemical and thermal nonequilibrium processes, such as dissociation of diatomic molecules and vibrational energy relaxation. Current models are outdated and inadequate for advanced applications. We describe a multiscale computational study of gas-phase thermochemical processes in hypersonic flows, starting at the atomic scale and building systematically up to the continuum scale. The project was part of a larger effort centered on collaborations between aerospace scientists and computational chemists. We discuss the construction of potential energy surfaces for the N4, N2O2, and O4 systems, focusing especially on the multi-dimensional fitting problem. A new local fitting method named L-IMLS-G2 is presented and compared with a global fitting method. Then, we describe the theory of the quasiclassical trajectory (QCT) approach for modeling molecular collisions. We explain how we implemented the approach in a new parallel code for high-performance computing platforms. Results from billions of QCT simulations of high-energy N2 + N2, N2 + N, and N2 + O2 collisions are reported and analyzed. Reaction rate constants are calculated and sets of reactive trajectories are characterized at both thermal equilibrium and nonequilibrium conditions. The data shed light on fundamental mechanisms of dissociation and exchange reactions -- and their coupling to internal energy transfer processes -- in thermal environments typical of hypersonic flows. We discuss how the outcomes of this investigation and other related studies lay a rigorous foundation for new macroscopic models for hypersonic CFD. This research was supported by the Department of Energy Computational Science Graduate Fellowship and by the Air Force Office of Scientific Research Multidisciplinary University Research Initiative.
Custovic, Adnan; Ainsworth, John; Arshad, Hasan; Bishop, Christopher; Buchan, Iain; Cullinan, Paul; Devereux, Graham; Henderson, John; Holloway, John; Roberts, Graham; Turner, Steve; Woodcock, Ashley; Simpson, Angela
2015-01-01
We created Asthma e-Lab, a secure web-based research environment to support consistent recording, description and sharing of data, computational/statistical methods and emerging findings across the five UK birth cohorts. The e-Lab serves as a data repository for our unified dataset and provides the computational resources and a scientific social network to support collaborative research. All activities are transparent, and emerging findings are shared via the e-Lab, linked to explanations of analytical methods, thus enabling knowledge transfer. eLab facilitates the iterative interdisciplinary dialogue between clinicians, statisticians, computer scientists, mathematicians, geneticists and basic scientists, capturing collective thought behind the interpretations of findings. PMID:25805205
ERIC Educational Resources Information Center
Scogin, Stephen C.
2016-01-01
"PlantingScience" is an award-winning program recognized for its innovation and use of computer-supported scientist mentoring. Science learners work on inquiry-based experiments in their classrooms and communicate asynchronously with practicing plant scientist-mentors about the projects. The purpose of this study was to identify specific…
Cross Domain Deterrence: Livermore Technical Report, 2014-2016
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnes, Peter D.; Bahney, Ben; Matarazzo, Celeste
2016-08-03
Lawrence Livermore National Laboratory (LLNL) is an original collaborator on the project titled “Deterring Complex Threats: The Effects of Asymmetry, Interdependence, and Multi-polarity on International Strategy,” (CDD Project) led by the UC Institute on Global Conflict and Cooperation at UCSD under PIs Jon Lindsay and Erik Gartzke , and funded through the DoD Minerva Research Initiative. In addition to participating in workshops and facilitating interaction among UC social scientists, LLNL is leading the computational modeling effort and assisting with empirical case studies to probe the viability of analytic, modeling and data analysis concepts. This report summarizes LLNL work on themore » CDD Project to date, primarily in Project Years 1-2, corresponding to Federal fiscal year 2015. LLNL brings two unique domains of expertise to bear on this Project: (1) access to scientific expertise on the technical dimensions of emerging threat technology, and (2) high performance computing (HPC) expertise, required for analyzing the complexity of bargaining interactions in the envisioned threat models. In addition, we have a small group of researchers trained as social scientists who are intimately familiar with the International Relations research. We find that pairing simulation scientists, who are typically trained in computer science, with domain experts, social scientists in this case, is the most effective route to developing powerful new simulation tools capable of representing domain concepts accurately and answering challenging questions in the field.« less
Towards Robot Scientists for autonomous scientific discovery
2010-01-01
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist. PMID:20119518
Towards Robot Scientists for autonomous scientific discovery.
Sparkes, Andrew; Aubrey, Wayne; Byrne, Emma; Clare, Amanda; Khan, Muhammed N; Liakata, Maria; Markham, Magdalena; Rowland, Jem; Soldatova, Larisa N; Whelan, Kenneth E; Young, Michael; King, Ross D
2010-01-04
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.
Implementations of the CC'01 Human-Computer Interaction Guidelines Using Bloom's Taxonomy
ERIC Educational Resources Information Center
Manaris, Bill; Wainer, Michael; Kirkpatrick, Arthur E.; Stalvey, RoxAnn H.; Shannon, Christine; Leventhal, Laura; Barnes, Julie; Wright, John; Schafer, J. Ben; Sanders, Dean
2007-01-01
In today's technology-laden society human-computer interaction (HCI) is an important knowledge area for computer scientists and software engineers. This paper surveys existing approaches to incorporate HCI into computer science (CS) and such related issues as the perceived gap between the interests of the HCI community and the needs of CS…
Eckert, Wallace John (1902-71)
NASA Astrophysics Data System (ADS)
Murdin, P.
2000-11-01
Computer scientist and astronomer. Born in Pittsburgh, PA, Eckert was a pioneer of the use of IBM punched card equipment for astronomical calculations. As director of the US Nautical Almanac Office he introduced computer methods to calculate and print tables instead of relying on human `computers'. When, later, he became director of the Watson Scientific Computing Laboratory at Columbia Universit...
"I'm Good, but Not That Good": Digitally-Skilled Young People's Identity in Computing
ERIC Educational Resources Information Center
Wong, Billy
2017-01-01
Computers and information technology are fast becoming a part of young people's everyday life. However, there remains a difference between the majority who can use computers and the minority who are computer scientists or professionals. Drawing on 32 semi-structured interviews with digitally skilled young people (aged 13-19), we explore their…
ERIC Educational Resources Information Center
Lesgold, Alan; Reif, Frederick
The future of computers in education and the research needed to realize the computer's potential are discussed in this report, which presents a summary and the conclusions from an invitational conference involving 40 computer scientists, psychologists, educational researchers, teachers, school administrators, and parents. The summary stresses the…
Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levy, Scott N.
2016-05-01
High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the rst supercomputer capable executing more than an exa op, 10 18 oating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extremescale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains e ective on current systems, increasing themore » scale of today's systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniqes include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and softwarebased memory fault correction. In this thesis, we examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, we evaluate the potential impact of rollback avoidance on these systems. We then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, we examine the feasibility of using this technique to protect against memory faults in kernel memory.« less
2005 White Paper on Institutional Capability Computing Requirements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carnes, B; McCoy, M; Seager, M
This paper documents the need for a significant increase in the computing infrastructure provided to scientists working in the unclassified domains at Lawrence Livermore National Laboratory (LLNL). This need could be viewed as the next step in a broad strategy outlined in the January 2002 White Paper (UCRL-ID-147449) that bears essentially the same name as this document. Therein we wrote: 'This proposed increase could be viewed as a step in a broader strategy linking hardware evolution to applications development that would take LLNL unclassified computational science to a position of distinction if not preeminence by 2006.' This position of distinctionmore » has certainly been achieved. This paper provides a strategy for sustaining this success but will diverge from its 2002 predecessor in that it will: (1) Amplify the scientific and external success LLNL has enjoyed because of the investments made in 2002 (MCR, 11 TF) and 2004 (Thunder, 23 TF). (2) Describe in detail the nature of additional investments that are important to meet both the institutional objectives of advanced capability for breakthrough science and the scientists clearly stated request for adequate capacity and more rapid access to moderate-sized resources. (3) Put these requirements in the context of an overall strategy for simulation science and external collaboration. While our strategy for Multiprogrammatic and Institutional Computing (M&IC) has worked well, three challenges must be addressed to assure and enhance our position. The first is that while we now have over 50 important classified and unclassified simulation codes available for use by our computational scientists, we find ourselves coping with high demand for access and long queue wait times. This point was driven home in the 2005 Institutional Computing Executive Group (ICEG) 'Report Card' to the Deputy Director for Science and Technology (DDST) Office and Computation Directorate management. The second challenge is related to the balance that should be maintained in the simulation environment. With the advent of Thunder, the institution directed a change in course from past practice. Instead of making Thunder available to the large body of scientists, as was MCR, and effectively using it as a capacity system, the intent was to make it available to perhaps ten projects so that these teams could run very aggressive problems for breakthrough science. This usage model established Thunder as a capability system. The challenge this strategy raises is that the majority of scientists have not seen an improvement in capacity computing resources since MCR, thus creating significant tension in the system. The question then is: 'How do we address the institution's desire to maintain the potential for breakthrough science and also meet the legitimate requests from the ICEG to achieve balance?' Both the capability and the capacity environments must be addressed through this one procurement. The third challenge is to reach out more aggressively to the national science community to encourage access to LLNL resources as part of a strategy for sharpening our science through collaboration. Related to this, LLNL has been unable in the past to provide access for sensitive foreign nationals (SFNs) to the Livermore Computing (LC) unclassified 'yellow' network. Identifying some mechanism for data sharing between LLNL computational scientists and SFNs would be a first practical step in fostering cooperative, collaborative relationships with an important and growing sector of the American science community.« less
NASA Astrophysics Data System (ADS)
Wyborn, L. A.; Evans, B. J. K.; Pugh, T.; Lescinsky, D. T.; Foster, C.; Uhlherr, A.
2014-12-01
The National Computational Infrastructure (NCI) at the Australian National University (ANU) is a partnership between CSIRO, ANU, Bureau of Meteorology (BoM) and Geoscience Australia. Recent investments in a 1.2 PFlop Supercomputer (Raijin), ~ 20 PB data storage using Lustre filesystems and a 3000 core high performance cloud have created a hybrid platform for higher performance computing and data-intensive science to enable large scale earth and climate systems modelling and analysis. There are > 3000 users actively logging in and > 600 projects on the NCI system. Efficiently scaling and adapting data and software systems to petascale infrastructures requires the collaborative development of an architecture that is designed, programmed and operated to enable users to interactively invoke different forms of in-situ computation over complex and large scale data collections. NCI makes available major and long tail data collections from both the government and research sectors based on six themes: 1) weather, climate and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology and 6) astronomy, bio and social. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. Collections are the operational form for data management and access. Similar data types from individual custodians are managed cohesively. Use of international standards for discovery and interoperability allow complex interactions within and between the collections. This design facilitates a transdisciplinary approach to research and enables a shift from small scale, 'stove-piped' science efforts to large scale, collaborative systems science. This new and complex infrastructure requires a move to shared, globally trusted software frameworks that can be maintained and updated. Workflow engines become essential and need to integrate provenance, versioning, traceability, repeatability and publication. There are also human resource challenges as highly skilled HPC/HPD specialists, specialist programmers, and data scientists are required whose skills can support scaling to the new paradigm of effective and efficient data-intensive earth science analytics on petascale, and soon to be exascale systems.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-01
... compete for high tech employees, and in particular specialized computer science and engineering talent on the basis of salaries, benefits, and career opportunities. In recent years, talented computer... Venue 4. Each Defendant hires specialized computer engineers and scientists throughout the United States...
Advanced Biomedical Computing Center (ABCC) | DSITP
The Advanced Biomedical Computing Center (ABCC), located in Frederick Maryland (MD), provides HPC resources for both NIH/NCI intramural scientists and the extramural biomedical research community. Its mission is to provide HPC support, to provide collaborative research, and to conduct in-house research in various areas of computational biology and biomedical research.
Computer Art--A New Tool in Advertising Graphics.
ERIC Educational Resources Information Center
Wassmuth, Birgit L.
Using computers to produce art began with scientists, mathematicians, and individuals with strong technical backgrounds who used the graphic material as visualizations of data in technical fields. People are using computer art in advertising, as well as in painting; sculpture; music; textile, product, industrial, and interior design; architecture;…
Integrating Computational Science Tools into a Thermodynamics Course
ERIC Educational Resources Information Center
Vieira, Camilo; Magana, Alejandra J.; García, R. Edwin; Jana, Aniruddha; Krafcik, Matthew
2018-01-01
Computational tools and methods have permeated multiple science and engineering disciplines, because they enable scientists and engineers to process large amounts of data, represent abstract phenomena, and to model and simulate complex concepts. In order to prepare future engineers with the ability to use computational tools in the context of…
hackseq: Catalyzing collaboration between biological and computational scientists via hackathon.
2017-01-01
hackseq ( http://www.hackseq.com) was a genomics hackathon with the aim of bringing together a diverse set of biological and computational scientists to work on collaborative bioinformatics projects. In October 2016, 66 participants from nine nations came together for three days for hackseq and collaborated on nine projects ranging from data visualization to algorithm development. The response from participants was overwhelmingly positive with 100% (n = 54) of survey respondents saying they would like to participate in future hackathons. We detail key steps for others interested in organizing a successful hackathon and report excerpts from each project.
hackseq: Catalyzing collaboration between biological and computational scientists via hackathon
2017-01-01
hackseq ( http://www.hackseq.com) was a genomics hackathon with the aim of bringing together a diverse set of biological and computational scientists to work on collaborative bioinformatics projects. In October 2016, 66 participants from nine nations came together for three days for hackseq and collaborated on nine projects ranging from data visualization to algorithm development. The response from participants was overwhelmingly positive with 100% (n = 54) of survey respondents saying they would like to participate in future hackathons. We detail key steps for others interested in organizing a successful hackathon and report excerpts from each project. PMID:28417000
NASA Astrophysics Data System (ADS)
Aourag, H.
2008-09-01
In the past, the search for new and improved materials was characterized mostly by the use of empirical, trial- and-error methods. This picture of materials science has been changing as the knowledge and understanding of fundamental processes governing a material's properties and performance (namely, composition, structure, history, and environment) have increased. In a number of cases, it is now possible to predict a material's properties before it has even been manufactured thus greatly reducing the time spent on testing and development. The objective of modern materials science is to tailor a material (starting with its chemical composition, constituent phases, and microstructure) in order to obtain a desired set of properties suitable for a given application. In the short term, the traditional "empirical" methods for developing new materials will be complemented to a greater degree by theoretical predictions. In some areas, computer simulation is already used by industry to weed out costly or improbable synthesis routes. Can novel materials with optimized properties be designed by computers? Advances in modelling methods at the atomic level coupled with rapid increases in computer capabilities over the last decade have led scientists to answer this question with a resounding "yes'. The ability to design new materials from quantum mechanical principles with computers is currently one of the fastest growing and most exciting areas of theoretical research in the world. The methods allow scientists to evaluate and prescreen new materials "in silico" (in vitro), rather than through time consuming experimentation. The Materials Genome Project is to pursue the theory of large scale modeling as well as powerful methods to construct new materials, with optimized properties. Indeed, it is the intimate synergy between our ability to predict accurately from quantum theory how atoms can be assembled to form new materials and our capacity to synthesize novel materials atom-by-atom that gives to the Materials Genome Project its extraordinary intellectual vitality. Consequently, in designing new materials through computer simulation, our primary objective is to rapidly screen possible designs to find those few that will enhance the competitiveness of industries or have positive benefits to society. Examples include screening of cancer drugs, advances in catalysis for energy production, design of new alloys and multilayers and processing of semiconductors.
NASA Astrophysics Data System (ADS)
Kasprak, A.; Brasington, J.; Hafen, K.; Wheaton, J. M.
2015-12-01
Numerical models that predict channel evolution through time are an essential tool for investigating processes that occur over timescales which render field observation intractable. However, available morphodynamic models generally take one of two approaches to the complex problem of computing morphodynamics, resulting in oversimplification of the relevant physics (e.g. cellular models) or faithful, yet computationally intensive, representations of the hydraulic and sediment transport processes at play. The practical implication of these approaches is that river scientists must often choose between unrealistic results, in the case of the former, or computational demands that render modeling realistic spatiotemporal scales of channel evolution impossible. Here we present a new modeling framework that operates at the timescale of individual competent flows (e.g. floods), and uses a highly-simplified sediment transport routine that moves volumes of material according to morphologically-derived characteristic transport distances, or path lengths. Using this framework, we have constructed an open-source morphodynamic model, termed MoRPHED, which is here applied, and its validity investigated, at timescales ranging from a single event to a decade on two braided rivers in the UK and New Zealand. We do not purport that MoRPHED is the best, nor even an adequate, tool for modeling braided river dynamics at this range of timescales. Rather, our goal in this research is to explore the utility, feasibility, and sensitivity of an event-scale, path-length-based modeling framework for predicting braided river dynamics. To that end, we further explore (a) which processes are naturally emergent and which must be explicitly parameterized in the model, (b) the sensitivity of the model to the choice of particle travel distance, and (c) whether an event-scale model timestep is adequate for producing braided channel dynamics. The results of this research may inform techniques for future morphodynamic modeling that seeks to maximize computational resources while modeling fluvial dynamics at the timescales of change.
PoPLAR: Portal for Petascale Lifescience Applications and Research
2013-01-01
Background We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. Methods The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. Results This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. Conclusions This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. PMID:23902523
Automated metadata--final project report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schissel, David
This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Preparing for in situ processing on upcoming leading-edge supercomputers
Kress, James; Churchill, Randy Michael; Klasky, Scott; ...
2016-10-01
High performance computing applications are producing increasingly large amounts of data and placing enormous stress on current capabilities for traditional post-hoc visualization techniques. Because of the growing compute and I/O imbalance, data reductions, including in situ visualization, are required. These reduced data are used for analysis and visualization in a variety of different ways. Many of he visualization and analysis requirements are known a priori, but when they are not, scientists are dependent on the reduced data to accurately represent the simulation in post hoc analysis. The contributions of this paper is a description of the directions we are pursuingmore » to assist a large scale fusion simulation code succeed on the next generation of supercomputers. Finally, these directions include the role of in situ processing for performing data reductions, as well as the tradeoffs between data size and data integrity within the context of complex operations in a typical scientific workflow.« less
Time Dependent Tomography of the Solar Corona in Three Spatial Dimensions
NASA Astrophysics Data System (ADS)
Butala, M. D.; Frazin, R. A.; Kamalabadi, F.
2006-12-01
The combination of the soon to be launched STEREO mission with SOHO will provide scientists with three simultaneous space-borne views of the Sun. The increase in available measurements will reduce the data acquisition time necessary to obtain 3D coronal electron density (N_e) estimates from coronagraph images using a technique called solar rotational tomography (SRT). However, the data acquisition period will still be long enough for the corona to dynamically evolve, requiring time dependent solar tomography. The Kalman filter (KF) would seem to be an ideal computational method for time dependent SRT. Unfortunately, the KF scales poorly with problem size and is, as a result, inapplicable. A Monte Carlo approximation to the KF called the localized ensemble Kalman filter was developed for massive applications and has the promise of making the time dependent estimation of the 3D coronal N_e possible. We present simulations showing that this method will make time dependent tomography in three spatial dimensions computationally feasible.
Earth Sciences Push Radiative Transfer Theory
NASA Astrophysics Data System (ADS)
Davis, Anthony; Mishchenko, Michael
2009-12-01
2009 International Conference on Advances in Mathematics, Computational Methods, and Reactor Physics; Saratoga Springs, New York, 4-7 May 2009; The theories of radiative transfer and particle—particularly neutron—transport are grounded in distinctive microscale physics that deals with either optics or particle dynamics. However, it is not practical to track every wave or particle in macroscopic systems, nor do all of these details matter. That is why Newton's laws, which describe individual particles, are replaced by those of Euler, Navier-Stokes, Maxwell, Boltzmann, Gibbs, and others, which describe the collective behavior of vast numbers of particles. And that is why the radiative transfer (RT) equation is used to describe the flow of radiation through geophysical-scale systems, leaving to Maxwell's wave equations only the task of providing the optical properties of the medium, be it air, water, snow, ice, or biomass. Interestingly, particle transport is determined by the linear transport equation, which is mathematically identical to the RT equation, so geophysicists and nuclear scientists are interested in the same mathematics and computational techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Chase Qishi; Zhu, Michelle Mengxia
The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
A Scientist's Guide to Achieving Broader Impacts through K-12 STEM Collaboration.
Komoroske, Lisa M; Hameed, Sarah O; Szoboszlai, Amber I; Newsom, Amanda J; Williams, Susan L
2015-03-01
The National Science Foundation and other funding agencies are increasingly requiring broader impacts in grant applications to encourage US scientists to contribute to science education and society. Concurrently, national science education standards are using more inquiry-based learning (IBL) to increase students' capacity for abstract, conceptual thinking applicable to real-world problems. Scientists are particularly well suited to engage in broader impacts via science inquiry outreach, because scientific research is inherently an inquiry-based process. We provide a practical guide to help scientists overcome obstacles that inhibit their engagement in K-12 IBL outreach and to attain the accrued benefits. Strategies to overcome these challenges include scaling outreach projects to the time available, building collaborations in which scientists' research overlaps with curriculum, employing backward planning to target specific learning objectives, encouraging scientists to share their passion, as well as their expertise with students, and transforming institutional incentives to support scientists engaging in educational outreach.
Multiple Scales in Fluid Dynamics and Meteorology: The DFG Priority Programme 1276 MetStröm
NASA Astrophysics Data System (ADS)
von Larcher, Th; Klein, R.
2012-04-01
Geophysical fluid motions are characterized by a very wide range of length and time scales, and by a rich collection of varying physical phenomena. The mathematical description of these motions reflects this multitude of scales and mechanisms in that it involves strong non-linearities and various scale-dependent singular limit regimes. Considerable progress has been made in recent years in the mathematical modelling and numerical simulation of such flows in detailed process studies, numerical weather forecasting, and climate research. One task of outstanding importance in this context has been and will remain for the foreseeable future the subgrid scale parameterization of the net effects of non-resolved processes that take place on spacio-temporal scales not resolvable even by the largest most recent supercomputers. Since the advent of numerical weather forecasting some 60 years ago, one simple but efficient means to achieve improved forecasting skills has been increased spacio-temporal resolution. This seems quite consistent with the concept of convergence of numerical methods in Applied Mathematics and Computational Fluid Dynamics (CFD) at a first glance. Yet, the very notion of increased resolution in atmosphere-ocean science is very different from the one used in Applied Mathematics: For the mathematician, increased resolution provides the benefit of getting closer to the ideal of a converged solution of some given partial differential equations. On the other hand, the atmosphere-ocean scientist would naturally refine the computational grid and adjust his mathematical model, such that it better represents the relevant physical processes that occur at smaller scales. This conceptual contradiction remains largely irrelevant as long as geophysical flow models operate with fixed computational grids and time steps and with subgrid scale parameterizations being optimized accordingly. The picture changes fundamentally when modern techniques from CFD involving spacio-temporal grid adaptivity get invoked in order to further improve the net efficiency in exploiting the given computational resources. In the setting of geophysical flow simulation one must then employ subgrid scale parameterizations that dynamically adapt to the changing grid sizes and time steps, implement ways to judiciously control and steer the newly available flexibility of resolution, and invent novel ways of quantifying the remaining errors. The DFG priority program MetStröm covers the expertise of Meteorology, Fluid Dynamics, and Applied Mathematics to develop model- as well as grid-adaptive numerical simulation concepts in multidisciplinary projects. The goal of this priority programme is to provide simulation models which combine scale-dependent (mathematical) descriptions of key physical processes with adaptive flow discretization schemes. Deterministic continuous approaches and discrete and/or stochastic closures and their possible interplay are taken into consideration. Research focuses on the theory and methodology of multiscale meteorological-fluid mechanics modelling. Accompanying reference experiments support model validation.
Automated Knowledge Discovery from Simulators
NASA Technical Reports Server (NTRS)
Burl, Michael C.; DeCoste, D.; Enke, B. L.; Mazzoni, D.; Merline, W. J.; Scharenbroich, L.
2006-01-01
In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/ parameter/model space that lead to a particular output behavior. Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study otherwise. However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more efficient knowledge extraction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
What Physicists Should Know About High Performance Computing - Circa 2002
NASA Astrophysics Data System (ADS)
Frederick, Donald
2002-08-01
High Performance Computing (HPC) is a dynamic, cross-disciplinary field that traditionally has involved applied mathematicians, computer scientists, and others primarily from the various disciplines that have been major users of HPC resources - physics, chemistry, engineering, with increasing use by those in the life sciences. There is a technological dynamic that is powered by economic as well as by technical innovations and developments. This talk will discuss practical ideas to be considered when developing numerical applications for research purposes. Even with the rapid pace of development in the field, the author believes that these concepts will not become obsolete for a while, and will be of use to scientists who either are considering, or who have already started down the HPC path. These principles will be applied in particular to current parallel HPC systems, but there will also be references of value to desktop users. The talk will cover such topics as: computing hardware basics, single-cpu optimization, compilers, timing, numerical libraries, debugging and profiling tools and the emergence of Computational Grids.
ERIC Educational Resources Information Center
Pinelli, Thomas E.; Sato, Yuko; Barclay, Rebecca O.; Kennedy, John M.
1997-01-01
Japanese (n=94) and U.S. (n=340) aerospace scientists/engineers described time spent communicating information, collaborative writing, importance of technical communication courses, and the use of libraries, computer networks, and technical reports. Japanese respondents had greater language fluency; U.S. respondents spent more time with…
MeDICi Software Superglue for Data Analysis Pipelines
Ian Gorton
2017-12-09
The Middleware for Data-Intensive Computing (MeDICi) Integration Framework is an integrated middleware platform developed to solve data analysis and processing needs of scientists across many domains. MeDICi is scalable, easily modified, and robust to multiple languages, protocols, and hardware platforms, and in use today by PNNL scientists for bioinformatics, power grid failure analysis, and text analysis.
The Draw a Scientist Test: A Different Population and a Somewhat Different Story
ERIC Educational Resources Information Center
Thomas, Mark D.; Henley, Tracy B.; Snell, Catherine M.
2006-01-01
This study examined Draw-a-Scientist-Test (DAST) images solicited from 212 undergraduate students for the presence of traditional gender stereotypes. Participants were 100 males and 112 females enrolled in psychology or computer science courses with a mean age of 21.02 years. A standard multiple regression generated a model that accounts for the…
Kobayashi, M; Irino, T; Sweldens, W
2001-10-23
Multiscale computing (MSC) involves the computation, manipulation, and analysis of information at different resolution levels. Widespread use of MSC algorithms and the discovery of important relationships between different approaches to implementation were catalyzed, in part, by the recent interest in wavelets. We present two examples that demonstrate how MSC can help scientists understand complex data. The first is from acoustical signal processing and the second is from computer graphics.
Information processing, computation, and cognition
Scarantino, Andrea
2010-01-01
Computation and information processing are among the most fundamental notions in cognitive science. They are also among the most imprecisely discussed. Many cognitive scientists take it for granted that cognition involves computation, information processing, or both – although others disagree vehemently. Yet different cognitive scientists use ‘computation’ and ‘information processing’ to mean different things, sometimes without realizing that they do. In addition, computation and information processing are surrounded by several myths; first and foremost, that they are the same thing. In this paper, we address this unsatisfactory state of affairs by presenting a general and theory-neutral account of computation and information processing. We also apply our framework by analyzing the relations between computation and information processing on one hand and classicism, connectionism, and computational neuroscience on the other. We defend the relevance to cognitive science of both computation, at least in a generic sense, and information processing, in three important senses of the term. Our account advances several foundational debates in cognitive science by untangling some of their conceptual knots in a theory-neutral way. By leveling the playing field, we pave the way for the future resolution of the debates’ empirical aspects. PMID:22210958
Institute for Sustained Performance, Energy, and Resilience (SuPER)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jagode, Heike; Bosilca, George; Danalis, Anthony
The University of Tennessee (UTK) and University of Texas at El Paso (UTEP) partnership supported the three main thrusts of the SUPER project---performance, energy, and resilience. The UTK-UTEP effort thus helped advance the main goal of SUPER, which was to ensure that DOE's computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems. This goal is being met by providing application scientists with strategies and tools to productively maximize performance, conserve energy, and attain resilience. The primary vehicle through which UTK provided performance measurement support to SUPER and the larger HPC community is the Performance Applicationmore » Programming Interface (PAPI). PAPI is an ongoing project that provides a consistent interface and methodology for collecting hardware performance information from various hardware and software components, including most major CPUs, GPUs and accelerators, interconnects, I/O systems, and power interfaces, as well as virtual cloud environments. The PAPI software is widely used for performance modeling of scientific and engineering applications---for example, the HOMME (High Order Methods Modeling Environment) climate code, and the GAMESS and NWChem computational chemistry codes---on DOE supercomputers. PAPI is widely deployed as middleware for use by higher-level profiling, tracing, and sampling tools (e.g., CrayPat, HPCToolkit, Scalasca, Score-P, TAU, Vampir, PerfExpert), making it the de facto standard for hardware counter analysis. PAPI has established itself as fundamental software infrastructure in every application domain (spanning academia, government, and industry), where improving performance can be mission critical. Ultimately, as more application scientists migrate their applications to HPC platforms, they will benefit from the extended capabilities this grant brought to PAPI to analyze and optimize performance in these environments, whether they use PAPI directly, or via third-party performance tools. Capabilities added to PAPI through this grant include support for new architectures such as the lastest GPU and Xeon Phi accelerators, and advanced power measurement and management features. Another important topic for the UTK team was providing support for a rich ecosystem of different fault management strategies in the context of parallel computing. Our long term efforts have been oriented toward proposing flexible strategies and providing building boxes that application developers can use to build the most efficient fault management technique for their application. These efforts span across the entire software spectrum, from theoretical models of existing strategies to easily assess their performance, to algorithmic modifications to take advantage of specific mathematical properties for data redundancy and to extensions to widely used programming paradigms to empower the application developers to deal with all types of faults. We have also continued our tight collaborations with users to help them adopt these technologies to ensure their application always deliver meaningful scientific data. Large supercomputer systems are becoming more and more power and energy constrained, and future systems and applications running on them will need to be optimized to run under power caps and/or minimize energy consumption. The UTEP team contributed to the SUPER energy thrust by developing power modeling methodologies and investigating power management strategies. Scalability modeling results showed that some applications can scale better with respect to an increasing power budget than with respect to only the number of processors. Power management, in particular shifting power to processors on the critical path of an application execution, can reduce perturbation due to system noise and other sources of runtime variability, which are growing problems on large-scale power-constrained computer systems.« less
Mobile Devices and GPU Parallelism in Ionospheric Data Processing
NASA Astrophysics Data System (ADS)
Mascharka, D.; Pankratius, V.
2015-12-01
Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of smartphones.
Architecture for the Next Generation System Management Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallard, Jerome; Lebre, I Adrien; Morin, Christine
2011-01-01
To get more results or greater accuracy, computational scientists execute their applications on distributed computing platforms such as Clusters, Grids and Clouds. These platforms are different in terms of hardware and software resources as well as locality: some span across multiple sites and multiple administrative domains whereas others are limited to a single site/domain. As a consequence, in order to scale their applica- tions up the scientists have to manage technical details for each target platform. From our point of view, this complexity should be hidden from the scientists who, in most cases, would prefer to focus on their researchmore » rather than spending time dealing with platform configuration concerns. In this article, we advocate for a system management framework that aims to automatically setup the whole run-time environment according to the applications needs. The main difference with regards to usual approaches is that they generally only focus on the software layer whereas we address both the hardware and the software expecta- tions through a unique system. For each application, scientists describe their requirements through the definition of a Virtual Platform (VP) and a Virtual System Environment (VSE). Relying on the VP/VSE definitions, the framework is in charge of: (i) the configuration of the physical infrastructure to satisfy the VP requirements, (ii) the setup of the VP, and (iii) the customization of the execution environment (VSE) upon the former VP. We propose a new formalism that the system can rely upon to successfully perform each of these three steps without burdening the user with the specifics of the configuration for the physical resources, and system management tools. This formalism leverages Goldberg s theory for recursive virtual machines by introducing new concepts based on system virtualization (identity, partitioning, aggregation) and emulation (simple, abstraction). This enables the definition of complex VP/VSE configurations without making assumptions about the hardware and the software re- sources. For each requirement, the system executes the corresponding operation with the appropriate management tool. As a proof of concept, we implemented a first prototype that currently interacts with several system management tools (e.g., OSCAR, the Grid 5000 toolkit, and XtreemOS) and that can be easily extended to integrate new resource brokers or cloud systems such as Nimbus, OpenNebula or Eucalyptus for instance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schecker, Jay A
After a prolonged absence, the word 'nuclear' has returned to the lexicon of sustainable domestic energy resources. Due in no small part to its demonstrated reliability, nuclear power is poised to playa greater role in the nation's energy future, producing clean, carbon-neutral electricity and contributing even more to our energy security. To nuclear scientists, the resurgence presents an opportunity to inject new technologies into the industry to maximize the benefits that nuclear energy can provide. 'By developing new options for waste management and exploiting new materials to make key technological advances, we can significantly impact the use of nuclear energymore » in our future energy mix,' says Chris Stanek, a materials scientist at Los Alamos National Laboratory. Stanek approaches the big technology challenges by thinking way small, all the way down to the atoms. He and his colleagues are using cutting edge atomic-scale simulations to address a difficult aspect of nuclear waste -- predicting its behavior far into the future. Their research is part of a broader, coordinated effort on the part of the Laboratory to use its considerable experimental, theoretical, and computational capabilities to explore advanced materials central to not only waste issues, but to nuclear fuels as well.« less
eButterfly: Leveraging Massive Online Citizen Science for Butterfly Conservation
Prudic, Kathleen L.; McFarland, Kent P.; Oliver, Jeffrey C.; Hutchinson, Rebecca A.; Long, Elizabeth C.; Kerr, Jeremy T.; Larrivée, Maxim
2017-01-01
Data collection, storage, analysis, visualization, and dissemination are changing rapidly due to advances in new technologies driven by computer science and universal access to the internet. These technologies and web connections place human observers front and center in citizen science-driven research and are critical in generating new discoveries and innovation in such fields as astronomy, biodiversity, and meteorology. Research projects utilizing a citizen science approach address scientific problems at regional, continental, and even global scales otherwise impossible for a single lab or even a small collection of academic researchers. Here we describe eButterfly an integrative checklist-based butterfly monitoring and database web-platform that leverages the skills and knowledge of recreational butterfly enthusiasts to create a globally accessible unified database of butterfly observations across North America. Citizen scientists, conservationists, policy makers, and scientists are using eButterfly data to better understand the biological patterns of butterfly species diversity and how environmental conditions shape these patterns in space and time. eButterfly in collaboration with thousands of butterfly enthusiasts has created a near real-time butterfly data resource producing tens of thousands of observations per year open to all to share and explore. PMID:28524117
Designing Citizen Science Projects in the Era of Mega-Information and Connected Activism
NASA Astrophysics Data System (ADS)
Pompea, S. M.
2010-12-01
The design of citizen science projects must take many factors into account in order to be successful. Currently, there are a wide variety of citizen science projects with different aims, audiences, reporting methods, and degrees of scientific rigor and usefulness. Projects function on local, national, and worldwide scales and range in time from limited campaigns to around the clock projects. For current and future projects, advanced cell phones and mobile computing allow an unprecedented degree of connectivity and data transfer. These advances will greatly influence the design of citizen science projects. An unprecedented amount of data is available for data mining by interested citizen scientists; how can projects take advantage of this? Finally, a variety of citizen scientist projects have social activism and change as part of their mission and goals. How can this be harnessed in a constructive and efficient way? The design of projects must also select the proper role for experts and novices, provide quality control, and must motivate users to encourage long-term involvement. Effective educational and instructional materials design can be used to design responsive and effective projects in a more highly connected age with access to very large amounts of information.
A review of earth observation using mobile personal communication devices
NASA Astrophysics Data System (ADS)
Ferster, Colin J.; Coops, Nicholas C.
2013-02-01
Earth observation using mobile personal communication devices (MPCDs) is a recent advance with considerable promise for acquiring important and timely measurements. Globally, over 5 billion people have access to mobile phones, with an increasing proportion having access to smartphones with capabilities such as a camera, microphone, global positioning system (GPS), data storage, and networked data transfer. Scientists can view these devices as embedded sensors with the potential to take measurements of the Earth's surface and processes. To advance the state of Earth observation using MPCDs, scientists need to consider terms and concepts, from a broad range of disciplines including citizen science, image analysis, and computer vision. In this paper, as a result of our literature review, we identify a number of considerations for Earth observation using MPCDs such as methods of field collection, collecting measurements over broad areas, errors and biases, data processing, and accessibility of data. Developing effective frameworks for mobile data collection with public participation and strategies for minimizing bias, in combination with advancements in image processing techniques, will offer opportunities to collect Earth sensing data across a range of scales and perspectives, complimenting airborne and spaceborne remote sensing measurements.
eButterfly: Leveraging Massive Online Citizen Science for Butterfly Consevation.
Prudic, Kathleen L; McFarland, Kent P; Oliver, Jeffrey C; Hutchinson, Rebecca A; Long, Elizabeth C; Kerr, Jeremy T; Larrivée, Maxim
2017-05-18
Data collection, storage, analysis, visualization, and dissemination are changing rapidly due to advances in new technologies driven by computer science and universal access to the internet. These technologies and web connections place human observers front and center in citizen science-driven research and are critical in generating new discoveries and innovation in such fields as astronomy, biodiversity, and meteorology. Research projects utilizing a citizen science approach address scientific problems at regional, continental, and even global scales otherwise impossible for a single lab or even a small collection of academic researchers. Here we describe eButterfly an integrative checklist-based butterfly monitoring and database web-platform that leverages the skills and knowledge of recreational butterfly enthusiasts to create a globally accessible unified database of butterfly observations across North America. Citizen scientists, conservationists, policy makers, and scientists are using eButterfly data to better understand the biological patterns of butterfly species diversity and how environmental conditions shape these patterns in space and time. eButterfly in collaboration with thousands of butterfly enthusiasts has created a near real-time butterfly data resource producing tens of thousands of observations per year open to all to share and explore.
Representation of research hypotheses
2011-01-01
Background Hypotheses are now being automatically produced on an industrial scale by computers in biology, e.g. the annotation of a genome is essentially a large set of hypotheses generated by sequence similarity programs; and robot scientists enable the full automation of a scientific investigation, including generation and testing of research hypotheses. Results This paper proposes a logically defined way for recording automatically generated hypotheses in machine amenable way. The proposed formalism allows the description of complete hypotheses sets as specified input and output for scientific investigations. The formalism supports the decomposition of research hypotheses into more specialised hypotheses if that is required by an application. Hypotheses are represented in an operational way – it is possible to design an experiment to test them. The explicit formal description of research hypotheses promotes the explicit formal description of the results and conclusions of an investigation. The paper also proposes a framework for automated hypotheses generation. We demonstrate how the key components of the proposed framework are implemented in the Robot Scientist “Adam”. Conclusions A formal representation of automatically generated research hypotheses can help to improve the way humans produce, record, and validate research hypotheses. Availability http://www.aber.ac.uk/en/cs/research/cb/projects/robotscientist/results/ PMID:21624164
NASA Technical Reports Server (NTRS)
1990-01-01
NASA's Space Station Freedom Program (SSFP) planning efforts have identified a need for a payload training simulator system to serve as both a training facility and as a demonstrator to validate operational concepts. The envisioned MSFC Payload Training Complex (PTC) required to meet this need will train the Space Station payload scientists, station scientists, and ground controllers to operate the wide variety of experiments that will be onboard the Space Station Freedom. The Simulation Computer System (SCS) is the computer hardware, software, and workstations that will support the Payload Training Complex at MSFC. The purpose of this SCS Study is to investigate issues related to the SCS, alternative requirements, simulator approaches, and state-of-the-art technologies to develop candidate concepts and designs.
Space Station Simulation Computer System (SCS) study for NASA/MSFC. Phased development plan
NASA Technical Reports Server (NTRS)
1990-01-01
NASA's Space Station Freedom Program (SSFP) planning efforts have identified a need for a payload training simulator system to serve as both a training facility and as a demonstrator to validate operational concepts. The envisioned MSFC Payload Training Complex (PTC) required to meet this need will train the Space Station payload scientists, station scientists and ground controllers to operate the wide variety of experiments that will be onboard the Space Station Freedom. The Simulation Computer System (SCS) is made up of computer hardware, software, and workstations that will support the Payload Training Complex at MSFC. The purpose of this SCS Study is to investigate issues related to the SCS, alternative requirements, simulator approaches, and state-of-the-art technologies to develop candidate concepts and designs.
BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences
Dahlö, Martin; Haziza, Frédéric; Kallio, Aleksi; Korpelainen, Eija; Bongcam-Rudloff, Erik; Spjuth, Ola
2015-01-01
Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org. PMID:26401099
NASA Technical Reports Server (NTRS)
1990-01-01
NASA's Space Station Freedom Program (SSFP) planning efforts have identified a need for a payload training simulator system to serve as both a training facility and as a demonstrator to validate operational concepts. The envisioned MSFC Payload Training Complex (PTC) required to meet this need will train the Space Station payload scientists, station scientists, and ground controllers to operate the wide variety of experiments that will be onboard the Space Station Freedom. The Simulation Computer System (SCS) is made up of the computer hardware, software, and workstations that will support the Payload Training Complex at MSFC. The purpose of this SCS Study is to investigate issues related to the SCS, alternative requirements, simulator approaches, and state-of-the-art technologies to develop candidate concepts and designs.
BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences.
Dahlö, Martin; Haziza, Frédéric; Kallio, Aleksi; Korpelainen, Eija; Bongcam-Rudloff, Erik; Spjuth, Ola
2015-01-01
Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.
Space Station Simulation Computer System (SCS) study for NASA/MSFC. Operations concept report
NASA Technical Reports Server (NTRS)
1990-01-01
NASA's Space Station Freedom Program (SSFP) planning efforts have identified a need for a payload training simulator system to serve as both a training facility and as a demonstrator to validate operational concepts. The envisioned MSFC Payload Training Complex (PTC) required to meet this need will train the Space Station payload scientists, station scientists, and ground controllers to operate the wide variety of experiments that will be onboard the Space Station Freedom. The Simulation Computer System (SCS) is made up of computer hardware, software, and workstations that will support the Payload Training Complex at MSFC. The purpose of this SCS Study is to investigate issues related to the SCS, alternative requirements, simulator approaches, and state-of-the-art technologies to develop candidate concepts and designs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, W. E.
2004-08-16
Computational Science plays a big role in research and development in mathematics, science, engineering and biomedical disciplines. The Alliance for Computational Science Collaboration (ACSC) has the goal of training African-American and other minority scientists in the computational science field for eventual employment with the Department of Energy (DOE). The involvements of Historically Black Colleges and Universities (HBCU) in the Alliance provide avenues for producing future DOE African-American scientists. Fisk University has been participating in this program through grants from the DOE. The DOE grant supported computational science activities at Fisk University. The research areas included energy related projects, distributed computing,more » visualization of scientific systems and biomedical computing. Students' involvement in computational science research included undergraduate summer research at Oak Ridge National Lab, on-campus research involving the participation of undergraduates, participation of undergraduate and faculty members in workshops, and mentoring of students. These activities enhanced research and education in computational science, thereby adding to Fisk University's spectrum of research and educational capabilities. Among the successes of the computational science activities are the acceptance of three undergraduate students to graduate schools with full scholarships beginning fall 2002 (one for master degree program and two for Doctoral degree program).« less
Staff | Computational Science | NREL
develops and leads laboratory-wide efforts in high-performance computing and energy-efficient data centers Professional IV-High Perf Computing Jim.Albin@nrel.gov 303-275-4069 Ananthan, Shreyas Senior Scientist - High -Performance Algorithms and Modeling Shreyas.Ananthan@nrel.gov 303-275-4807 Bendl, Kurt IT Professional IV-High
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uhr, L.
1987-01-01
This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
How to Teach Residue Number System to Computer Scientists and Engineers
ERIC Educational Resources Information Center
Navi, K.; Molahosseini, A. S.; Esmaeildoust, M.
2011-01-01
The residue number system (RNS) has been an important research field in computer arithmetic for many decades, mainly because of its carry-free nature, which can provide high-performance computing architectures with superior delay specifications. Recently, research on RNS has found new directions that have resulted in the introduction of efficient…
A Research and Development Strategy for High Performance Computing.
ERIC Educational Resources Information Center
Office of Science and Technology Policy, Washington, DC.
This report is the result of a systematic review of the status and directions of high performance computing and its relationship to federal research and development. Conducted by the Federal Coordinating Council for Science, Engineering, and Technology (FCCSET), the review involved a series of workshops attended by numerous computer scientists and…
Relevancy in Problem Solving: A Computational Framework
ERIC Educational Resources Information Center
Kwisthout, Johan
2012-01-01
When computer scientists discuss the computational complexity of, for example, finding the shortest path from building A to building B in some town or city, their starting point typically is a formal description of the problem at hand, e.g., a graph with weights on every edge where buildings correspond to vertices, routes between buildings to…
Cultivating Critique: A (Humanoid) Response to the Online Teaching of Critical Thinking
ERIC Educational Resources Information Center
Waggoner, Matt
2013-01-01
The Turing era, defined by British mathematician and computer science pioneer Alan Turing's question about whether or not computers can think, is not over. Philosophers and scientists will continue to haggle over whether thought necessitates intentionality, and whether computation can rise to that level. Meanwhile, another frontier is emerging in…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Micah Johnson, Andrew Slaughter
PIKA is a MOOSE-based application for modeling micro-structure evolution of seasonal snow. The model will be useful for environmental, atmospheric, and climate scientists. Possible applications include application to energy balance models, ice sheet modeling, and avalanche forecasting. The model implements physics from published, peer-reviewed articles. The main purpose is to foster university and laboratory collaboration to build a larger multi-scale snow model using MOOSE. The main feature of the code is that it is implemented using the MOOSE framework, thus making features such as multiphysics coupling, adaptive mesh refinement, and parallel scalability native to the application. PIKA implements three equations:more » the phase-field equation for tracking the evolution of the ice-air interface within seasonal snow at the grain-scale; the heat equation for computing the temperature of both the ice and air within the snow; and the mass transport equation for monitoring the diffusion of water vapor in the pore space of the snow.« less
Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases
Lin, Chen; Mak, Wayne; Hong, Pengyu; Sepp, Katharine; Perrimon, Norbert
2010-01-01
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques. PMID:21278820
Frame, M.T.; Cotter, G.; Zolly, L.; Little, J.
2002-01-01
Whether your vantage point is that of an office window or a national park, your view undoubtedly encompasses a rich diversity of life forms, all carefully studied or managed by some scientist, resource manager, or planner. A few simple calculations - the number of species, their interrelationships, and the many researchers studying them - and you can easily see the tremendous challenges that the resulting biological data presents to the information and computer science communities. Biological information varies in format and content: it may pertain to a particular species or an entire ecosystem; it can contain land use characteristics, and geospatially referenced information. The complexity and uniqueness of each individual species or ecosystem do not easily lend themselves to today's computer science tools and applications. To address the challenges that the biological enterprise presents, the National Biological Information Infrastructure (NBII) (http://www.nbii.gov) was established in 1993 on the recommendation of the National Research Council (National Research Council 1993). The NBII is designed to address these issues on a national scale, and through international partnerships. This paper discusses current information and computer science efforts within the National Biological Information Infrastructure Program, and future computer science research endeavors that are needed to address the ever-growing issues related to our nation's biological concerns. ?? 2003 by The Haworth Press, Inc. All rights reserved.
Grid Computing Environment using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Alanis, Fransisco; Mahmood, Akhtar
2003-10-01
Custom-made Beowulf clusters using PCs are currently replacing expensive supercomputers to carry out complex scientific computations. At the University of Texas - Pan American, we built a 8 Gflops Beowulf Cluster for doing HEP research using RedHat Linux 7.3 and the LAM-MPI middleware. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes that were compiled in C on the cluster using the LAM-XMPI graphics user environment. We will demonstrate a "simple" prototype grid environment, where we will submit and run parallel jobs remotely across multiple cluster nodes over the internet from the presentation room at Texas Tech. University. The Sphinx Beowulf Cluster will be used for monte-carlo grid test-bed studies for the LHC-ATLAS high energy physics experiment. Grid is a new IT concept for the next generation of the "Super Internet" for high-performance computing. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
None
2017-12-09
Learn what it will take to create tomorrow's net-zero energy home as scientists reveal the secrets of cool roofs, smart windows, and computer-driven energy control systems. The net-zero energy home: Scientists are working to make tomorrow's homes more than just energy efficient -- they want them to be zero energy. Iain Walker, a scientist in the Lab's Energy Performance of Buildings Group, will discuss what it takes to develop net-zero energy houses that generate as much energy as they use through highly aggressive energy efficiency and on-site renewable energy generation. Talking back to the grid: Imagine programming your house to use less energy if the electricity grid is full or price are high. Mary Ann Piette, deputy director of Berkeley Lab's building technology department and director of the Lab's Demand Response Research Center, will discuss how new technologies are enabling buildings to listen to the grid and automatically change their thermostat settings or lighting loads, among other demands, in response to fluctuating electricity prices. The networked (and energy efficient) house: In the future, your home's lights, climate control devices, computers, windows, and appliances could be controlled via a sophisticated digital network. If it's plugged in, it'll be connected. Bruce Nordman, an energy scientist in Berkeley Lab's Energy End-Use Forecasting group, will discuss how he and other scientists are working to ensure these networks help homeowners save energy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Learn what it will take to create tomorrow's net-zero energy home as scientists reveal the secrets of cool roofs, smart windows, and computer-driven energy control systems. The net-zero energy home: Scientists are working to make tomorrow's homes more than just energy efficient -- they want them to be zero energy. Iain Walker, a scientist in the Lab's Energy Performance of Buildings Group, will discuss what it takes to develop net-zero energy houses that generate as much energy as they use through highly aggressive energy efficiency and on-site renewable energy generation. Talking back to the grid: Imagine programming your house tomore » use less energy if the electricity grid is full or price are high. Mary Ann Piette, deputy director of Berkeley Lab's building technology department and director of the Lab's Demand Response Research Center, will discuss how new technologies are enabling buildings to listen to the grid and automatically change their thermostat settings or lighting loads, among other demands, in response to fluctuating electricity prices. The networked (and energy efficient) house: In the future, your home's lights, climate control devices, computers, windows, and appliances could be controlled via a sophisticated digital network. If it's plugged in, it'll be connected. Bruce Nordman, an energy scientist in Berkeley Lab's Energy End-Use Forecasting group, will discuss how he and other scientists are working to ensure these networks help homeowners save energy.« less
The making of the Women in Biology forum (WiB) at Bioclues.
Singhania, Reeta Rani; Madduru, Dhatri; Pappu, Pranathi; Panchangam, Sameera; Suravajhala, Renuka; Chandrasekharan, Mohanalatha
2014-01-01
The Women in Biology forum (WiB) of Bioclues (India) began in 2009 to promote and support women pursuing careers in bioinformatics and computational biology. WiB was formed in order to help women scientists deprived of basic research, boost the prominence of women scientists particularly from developing countries, and bridge the gender gap to innovation. WiB has also served as a platform to highlight the work of established female scientists in these fields. Several award-winning women researchers have shared their experiences and provided valuable suggestions to WiB. Headed by Mohanalatha Chandrasekharan and supported by Dr. Reeta Rani Singhania and Renuka Suravajhala, WiB has seen major progress in the last couple of years particularly in the two avenues Mentoring and Research, off the four avenues in Bioclues: Mentoring, Outreach, Research and Entrepreneurship (MORE). In line with the Bioclues vision for bioinformatics in India, the WiB Journal Club (JoC) recognizes women scientists working on functional genomics and bioinformatics, and provides scientific mentorship and support for project design and hypothesis formulation. As a part of Bioclues, WiB members practice the group's open-desk policy and its belief that all members are free to express their own thoughts and opinions. The WiB forum appreciates suggestions and welcomes scientists from around the world to be a part of their mission to encourage women to pursue computational biology and bioinformatics.
Bernstam, Elmer V.; Hersh, William R.; Johnson, Stephen B.; Chute, Christopher G.; Nguyen, Hien; Sim, Ida; Nahm, Meredith; Weiner, Mark; Miller, Perry; DiLaura, Robert P.; Overcash, Marc; Lehmann, Harold P.; Eichmann, David; Athey, Brian D.; Scheuermann, Richard H.; Anderson, Nick; Starren, Justin B.; Harris, Paul A.; Smith, Jack W.; Barbour, Ed; Silverstein, Jonathan C.; Krusch, David A.; Nagarajan, Rakesh; Becich, Michael J.
2010-01-01
Clinical and translational research increasingly requires computation. Projects may involve multiple computationally-oriented groups including information technology (IT) professionals, computer scientists and biomedical informaticians. However, many biomedical researchers are not aware of the distinctions among these complementary groups, leading to confusion, delays and sub-optimal results. Although written from the perspective of clinical and translational science award (CTSA) programs within academic medical centers, the paper addresses issues that extend beyond clinical and translational research. The authors describe the complementary but distinct roles of operational IT, research IT, computer science and biomedical informatics using a clinical data warehouse as a running example. In general, IT professionals focus on technology. The authors distinguish between two types of IT groups within academic medical centers: central or administrative IT (supporting the administrative computing needs of large organizations) and research IT (supporting the computing needs of researchers). Computer scientists focus on general issues of computation such as designing faster computers or more efficient algorithms, rather than specific applications. In contrast, informaticians are concerned with data, information and knowledge. Biomedical informaticians draw on a variety of tools, including but not limited to computers, to solve information problems in health care and biomedicine. The paper concludes with recommendations regarding administrative structures that can help to maximize the benefit of computation to biomedical research within academic health centers. PMID:19550198
USDA-ARS?s Scientific Manuscript database
Nonlinear interactions and feedbacks across spatial and temporal scales are common features of biological and physical systems. These emergent behaviors often result in surprises that challenge the ability of scientists to understand and predict system behavior at one scale based on information at f...
Research Projects, Technical Reports and Publications
NASA Technical Reports Server (NTRS)
Oliger, Joseph
1996-01-01
The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities with research programs in the aerospace sciences, under contract with NASA. The primary mission of RIACS is to provide research and expertise in computer science and scientific computing to support the scientific missions of NASA ARC. The research carried out at RIACS must change its emphasis from year to year in response to NASA ARC's changing needs and technological opportunities. A flexible scientific staff is provided through a university faculty visitor program, a post doctoral program, and a student visitor program. Not only does this provide appropriate expertise but it also introduces scientists outside of NASA to NASA problems. A small group of core RIACS staff provides continuity and interacts with an ARC technical monitor and scientific advisory group to determine the RIACS mission. RIACS activities are reviewed and monitored by a USRA advisory council and ARC technical monitor. Research at RIACS is currently being done in the following areas: Advanced Methods for Scientific Computing High Performance Networks During this report pefiod Professor Antony Jameson of Princeton University, Professor Wei-Pai Tang of the University of Waterloo, Professor Marsha Berger of New York University, Professor Tony Chan of UCLA, Associate Professor David Zingg of University of Toronto, Canada and Assistant Professor Andrew Sohn of New Jersey Institute of Technology have been visiting RIACS. January 1, 1996 through September 30, 1996 RIACS had three staff scientists, four visiting scientists, one post-doctoral scientist, three consultants, two research associates and one research assistant. RIACS held a joint workshop with Code 1 29-30 July 1996. The workshop was held to discuss needs and opportunities in basic research in computer science in and for NASA applications. There were 14 talks given by NASA, industry and university scientists and three open discussion sessions. There were approximately fifty participants. A proceedings is being prepared. It is planned to have similar workshops on an annual basis. RIACS technical reports are usually preprints of manuscripts that have been submitted to research 'ournals or conference proceedings. A list of these reports for the period January i 1, 1996 through September 30, 1996 is in the Reports and Abstracts section of this report.
ERIC Educational Resources Information Center
Kiyici, Mubin
2011-01-01
HCI is a field which has an increasing popularity by virtue of the spread of the computers and internet and gradually contributes to the production of the user-friendlier software and hardware with the contribution of the scientists from different disciplines. Teacher candidates studying at the computer and instructional technologies department…
Making Advanced Scientific Algorithms and Big Scientific Data Management More Accessible
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkatakrishnan, S. V.; Mohan, K. Aditya; Beattie, Keith
2016-02-14
Synchrotrons such as the Advanced Light Source (ALS) at Lawrence Berkeley National Laboratory are known as user facilities. They are sources of extremely bright X-ray beams, and scientists come from all over the world to perform experiments that require these beams. As the complexity of experiments has increased, and the size and rates of data sets has exploded, managing, analyzing and presenting the data collected at synchrotrons has been an increasing challenge. The ALS has partnered with high performance computing, fast networking, and applied mathematics groups to create a"super-facility", giving users simultaneous access to the experimental, computational, and algorithmic resourcesmore » to overcome this challenge. This combination forms an efficient closed loop, where data despite its high rate and volume is transferred and processed, in many cases immediately and automatically, on appropriate compute resources, and results are extracted, visualized, and presented to users or to the experimental control system, both to provide immediate insight and to guide decisions about subsequent experiments during beam-time. In this paper, We will present work done on advanced tomographic reconstruction algorithms to support users of the 3D micron-scale imaging instrument (Beamline 8.3.2, hard X-ray micro-tomography).« less
ERIC Educational Resources Information Center
Childers, Gina; Jones, M. Gail
2015-01-01
Remote access technologies enable students to investigate science by utilizing scientific tools and communicating in real-time with scientists and researchers with only a computer and an Internet connection. Very little is known about student perceptions of how real remote investigations are and how immersed the students are in the experience.…
,; ,
1989-01-01
The scientists of the U.S. Geological Survey are engaged in a wide range of geologic, geophysical, geochemical, hydrologic, and cartographic programs, including the application of computer science to them. These programs offer exciting possibilities for scientific achievement and professional growth to young scientists through participation as Research Associates.
ERIC Educational Resources Information Center
Abbey, Cherie D., Ed.
This book, a special volume focusing on computer-related scientists and inventors, provides 12 biographical profiles of interest to readers ages 9 and above. The Biography Today series was created to appeal to young readers in a format they can enjoy reading and readily understand. Each entry provides at least one picture of the individual…
Fang, Hua; Zhang, Zhaoyang; Wang, Chanpaul Jin; Daneshmand, Mahmoud; Wang, Chonggang; Wang, Honggang
2015-01-01
Big data create values for business and research, but pose significant challenges in terms of networking, storage, management, analytics and ethics. Multidisciplinary collaborations from engineers, computer scientists, statisticians and social scientists are needed to tackle, discover and understand big data. This survey presents an overview of big data initiatives, technologies and research in industries and academia, and discusses challenges and potential solutions. PMID:26504265
Andrew J. Dennhardt; Adam E. Duerr; David Brandes; Todd E. Katzner
2015-01-01
Estimating population size is fundamental to conservation and management. Population size is typically estimated using survey data, computer models, or both. Some of the most extensive and often least expensive survey data are those collected by citizen-scientists. A challenge to citizen-scientists is that the vagility of many organisms can complicate data collection....
NASA Astrophysics Data System (ADS)
Yarker, M. B.; Stanier, C. O.; Forbes, C.; Park, S.
2011-12-01
As atmospheric scientists, we depend on Numerical Weather Prediction (NWP) models. We use them to predict weather patterns, to understand external forcing on the atmosphere, and as evidence to make claims about atmospheric phenomenon. Therefore, it is important that we adequately prepare atmospheric science students to use computer models. However, the public should also be aware of what models are in order to understand scientific claims about atmospheric issues, such as climate change. Although familiar with weather forecasts on television and the Internet, the general public does not understand the process of using computer models to generate a weather and climate forecasts. As a result, the public often misunderstands claims scientists make about their daily weather as well as the state of climate change. Since computer models are the best method we have to forecast the future of our climate, scientific models and modeling should be a topic covered in K-12 classrooms as part of a comprehensive science curriculum. According to the National Science Education Standards, teachers are encouraged to science models into the classroom as a way to aid in the understanding of the nature of science. However, there is very little description of what constitutes a science model, so the term is often associated with scale models. Therefore, teachers often use drawings or scale representations of physical entities, such as DNA, the solar system, or bacteria. In other words, models used in classrooms are often used as visual representations, but the purpose of science models is often overlooked. The implementation of a model-based curriculum in the science classroom can be an effective way to prepare students to think critically, problem solve, and make informed decisions as a contributing member of society. However, there are few resources available to help teachers implement science models into the science curriculum effectively. Therefore, this research project looks at strategies middle school science teachers use to implement science models into their classrooms. These teachers in this study took part in a week-long professional development designed to orient them towards appropriate use of science models for a unit on weather, climate, and energy concepts. The goal of this project is to describe the professional development and describe how teachers intend to incorporate science models into each of their individual classrooms.
NASA Technical Reports Server (NTRS)
Schulbach, Catherine H. (Editor)
2000-01-01
The purpose of the CAS workshop is to bring together NASA's scientists and engineers and their counterparts in industry, other government agencies, and academia working in the Computational Aerosciences and related fields. This workshop is part of the technology transfer plan of the NASA High Performance Computing and Communications (HPCC) Program. Specific objectives of the CAS workshop are to: (1) communicate the goals and objectives of HPCC and CAS, (2) promote and disseminate CAS technology within the appropriate technical communities, including NASA, industry, academia, and other government labs, (3) help promote synergy among CAS and other HPCC scientists, and (4) permit feedback from peer researchers on issues facing High Performance Computing in general and the CAS project in particular. This year we had a number of exciting presentations in the traditional aeronautics, aerospace sciences, and high-end computing areas and in the less familiar (to many of us affiliated with CAS) earth science, space science, and revolutionary computing areas. Presentations of more than 40 high quality papers were organized into ten sessions and presented over the three-day workshop. The proceedings are organized here for easy access: by author, title and topic.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geveci, Berk
The purpose of the SDAV institute is to provide tools and expertise in scientific data management, analysis, and visualization to DOE’s application scientists. Our goal is to actively work with application teams to assist them in achieving breakthrough science, and to provide technical solutions in the data management, analysis, and visualization regimes that are broadly used by the computational science community. Over the last 5 years members of our institute worked directly with application scientists and DOE leadership-class facilities to assist them by applying the best tools and technologies at our disposal. We also enhanced our tools based on inputmore » from scientists on their needs. Many of the applications we have been working with are based on connections with scientists established in previous years. However, we contacted additional scientists though our outreach activities, as well as engaging application teams running on leading DOE computing systems. Our approach is to employ an evolutionary development and deployment process: first considering the application of existing tools, followed by the customization necessary for each particular application, and then the deployment in real frameworks and infrastructures. The institute is organized into three areas, each with area leaders, who keep track of progress, engagement of application scientists, and results. The areas are: (1) Data Management, (2) Data Analysis, and (3) Visualization. Kitware has been involved in the Visualization area. This report covers Kitware’s contributions over the last 5 years (February 2012 – February 2017). For details on the work performed by the SDAV institute as a whole, please see the SDAV final report.« less
Moon Search Algorithms for NASA's Dawn Mission to Asteroid Vesta
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mcfadden, Lucy A.; Skillman, David R.; McLean, Brian; Mutchler, Max; Carsenty, Uri; Palmer, Eric E.
2012-01-01
A moon or natural satellite is a celestial body that orbits a planetary body such as a planet, dwarf planet, or an asteroid. Scientists seek understanding the origin and evolution of our solar system by studying moons of these bodies. Additionally, searches for satellites of planetary bodies can be important to protect the safety of a spacecraft as it approaches or orbits a planetary body. If a satellite of a celestial body is found, the mass of that body can also be calculated once its orbit is determined. Ensuring the Dawn spacecraft's safety on its mission to the asteroid Vesta primarily motivated the work of Dawn's Satellite Working Group (SWG) in summer of 2011. Dawn mission scientists and engineers utilized various computational tools and techniques for Vesta's satellite search. The objectives of this paper are to 1) introduce the natural satellite search problem, 2) present the computational challenges, approaches, and tools used when addressing this problem, and 3) describe applications of various image processing and computational algorithms for performing satellite searches to the electronic imaging and computer science community. Furthermore, we hope that this communication would enable Dawn mission scientists to improve their satellite search algorithms and tools and be better prepared for performing the same investigation in 2015, when the spacecraft is scheduled to approach and orbit the dwarf planet Ceres.
A short course on measure and probability theories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre
2004-02-01
This brief Introduction to Measure Theory, and its applications to Probabilities, corresponds to the lecture notes of a seminar series given at Sandia National Laboratories in Livermore, during the spring of 2003. The goal of these seminars was to provide a minimal background to Computational Combustion scientists interested in using more advanced stochastic concepts and methods, e.g., in the context of uncertainty quantification. Indeed, most mechanical engineering curricula do not provide students with formal training in the field of probability, and even in less in measure theory. However, stochastic methods have been used more and more extensively in the pastmore » decade, and have provided more successful computational tools. Scientists at the Combustion Research Facility of Sandia National Laboratories have been using computational stochastic methods for years. Addressing more and more complex applications, and facing difficult problems that arose in applications showed the need for a better understanding of theoretical foundations. This is why the seminar series was launched, and these notes summarize most of the concepts which have been discussed. The goal of the seminars was to bring a group of mechanical engineers and computational combustion scientists to a full understanding of N. WIENER'S polynomial chaos theory. Therefore, these lectures notes are built along those lines, and are not intended to be exhaustive. In particular, the author welcomes any comments or criticisms.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dasgupta, Aritra; Poco, Jorge; Bertini, Enrico
2016-01-01
The gap between large-scale data production rate and the rate of generation of data-driven scientific insights has led to an analytical bottleneck in scientific domains like climate, biology, etc. This is primarily due to the lack of innovative analytical tools that can help scientists efficiently analyze and explore alternative hypotheses about the data, and communicate their findings effectively to a broad audience. In this paper, by reflecting on a set of successful collaborative research efforts between with a group of climate scientists and visualization researchers, we introspect how interactive visualization can help reduce the analytical bottleneck for domain scientists.
Experiences with Efficient Methodologies for Teaching Computer Programming to Geoscientists
ERIC Educational Resources Information Center
Jacobs, Christian T.; Gorman, Gerard J.; Rees, Huw E.; Craig, Lorraine E.
2016-01-01
Computer programming was once thought of as a skill required only by professional software developers. But today, given the ubiquitous nature of computation and data science it is quickly becoming necessary for all scientists and engineers to have at least a basic knowledge of how to program. Teaching how to program, particularly to those students…
ERIC Educational Resources Information Center
Charleston, LaVar J.; Gilbert, Juan E.; Escobar, Barbara; Jackson, Jerlando F. L.
2014-01-01
African Americans represent 1.3% of all computing sciences faculty in PhD-granting departments, underscoring the severe underrepresentation of Black/African American tenure-track faculty in computing (CRA, 2012). The Future Faculty/Research Scientist Mentoring (FFRM) program, funded by the National Science Foundation, was found to be an effective…
NASA Technical Reports Server (NTRS)
1994-01-01
This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in the areas of (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving Langley facilities and scientists; and (4) computer science.
Is there a glass ceiling for highly cited scientists at the top of research universities?
Ioannidis, John P A
2010-12-01
University leaders aim to protect, shape, and promote the missions of their institutions. I evaluated whether top highly cited scientists are likely to occupy these positions. Of the current leaders of 96 U.S. high research activity universities, only 6 presidents or chancellors were found among the 4009 U.S. scientists listed in the ISIHighlyCited.com database. Of the current leaders of 77 UK universities, only 2 vice-chancellors were found among the 483 UK scientists listed in the same database. In a sample of 100 top-cited clinical medicine scientists and 100 top-cited biology and biochemistry scientists, only 1 and 1, respectively, had served at any time as president of a university. Among the leaders of 25 U.S. universities with the highest citation volumes, only 12 had doctoral degrees in life, natural, physical or computer sciences, and 5 of these 12 had a Hirsch citation index m < 1.0. The participation of highly cited scientists in the top leadership of universities is limited. This could have consequences for the research and overall mission of universities.
High-throughput landslide modelling using computational grids
NASA Astrophysics Data System (ADS)
Wallace, M.; Metson, S.; Holcombe, L.; Anderson, M.; Newbold, D.; Brook, N.
2012-04-01
Landslides are an increasing problem in developing countries. Multiple landslides can be triggered by heavy rainfall resulting in loss of life, homes and critical infrastructure. Through computer simulation of individual slopes it is possible to predict the causes, timing and magnitude of landslides and estimate the potential physical impact. Geographical scientists at the University of Bristol have developed software that integrates a physically-based slope hydrology and stability model (CHASM) with an econometric model (QUESTA) in order to predict landslide risk over time. These models allow multiple scenarios to be evaluated for each slope, accounting for data uncertainties, different engineering interventions, risk management approaches and rainfall patterns. Individual scenarios can be computationally intensive, however each scenario is independent and so multiple scenarios can be executed in parallel. As more simulations are carried out the overhead involved in managing input and output data becomes significant. This is a greater problem if multiple slopes are considered concurrently, as is required both for landslide research and for effective disaster planning at national levels. There are two critical factors in this context: generated data volumes can be in the order of tens of terabytes, and greater numbers of simulations result in long total runtimes. Users of such models, in both the research community and in developing countries, need to develop a means for handling the generation and submission of landside modelling experiments, and the storage and analysis of the resulting datasets. Additionally, governments in developing countries typically lack the necessary computing resources and infrastructure. Consequently, knowledge that could be gained by aggregating simulation results from many different scenarios across many different slopes remains hidden within the data. To address these data and workload management issues, University of Bristol particle physicists and geographical scientists are collaborating to develop methods for providing simple and effective access to landslide models and associated simulation data. Particle physicists have valuable experience in dealing with data complexity and management due to the scale of data generated by particle accelerators such as the Large Hadron Collider (LHC). The LHC generates tens of petabytes of data every year which is stored and analysed using the Worldwide LHC Computing Grid (WLCG). Tools and concepts from the WLCG are being used to drive the development of a Software-as-a-Service (SaaS) platform to provide access to hosted landslide simulation software and data. It contains advanced data management features and allows landslide simulations to be run on the WLCG, dramatically reducing simulation runtimes by parallel execution. The simulations are accessed using a web page through which users can enter and browse input data, submit jobs and visualise results. Replication of the data ensures a local copy can be accessed should a connection to the platform be unavailable. The platform does not know the details of the simulation software it runs, so it is therefore possible to use it to run alternative models at similar scales. This creates the opportunity for activities such as model sensitivity analysis and performance comparison at scales that are impractical using standalone software.
New computer system simplifies programming of mathematical equations
NASA Technical Reports Server (NTRS)
Reinfelds, J.; Seitz, R. N.; Wood, L. H.
1966-01-01
Automatic Mathematical Translator /AMSTRAN/ permits scientists or engineers to enter mathematical equations in their natural mathematical format and to obtain an immediate graphical display of the solution. This automatic-programming, on-line, multiterminal computer system allows experienced programmers to solve nonroutine problems.
What if we took a global look?
NASA Astrophysics Data System (ADS)
Ouellet Dallaire, C.; Lehner, B.
2014-12-01
Freshwater resources are facing unprecedented pressures. In hope to cope with this, Environmental Hydrology, Freshwater Biology, and Fluvial Geomorphology have defined conceptual approaches such as "environmental flow requirements", "instream flow requirements" or "normative flow regime" to define appropriate flow regime to maintain a given ecological status. These advances in the fields of freshwater resources management are asking scientists to create bridges across disciplines. Holistic and multi-scales approaches are becoming more and more common in water sciences research. The intrinsic nature of river systems demands these approaches to account for the upstream-downstream link of watersheds. Before recent technological developments, large scale analyses were cumbersome and, often, the necessary data was unavailable. However, new technologies, both for information collection and computing capacity, enable a high resolution look at the global scale. For rivers around the world, this new outlook is facilitated by the hydrologically relevant geo-spatial database HydroSHEDS. This database now offers more than 24 millions of kilometers of rivers, some never mapped before, at the click of a fingertip. Large and, even, global scale assessments can now be used to compare rivers around the world. A river classification framework was developed using HydroSHEDS called GloRiC (Global River Classification). This framework advocates for holistic approach to river systems by using sub-classifications drawn from six disciplines related to river sciences: Hydrology, Physiography and climate, Geomorphology, Chemistry, Biology and Human impact. Each of these disciplines brings complementary information on the rivers that is relevant at different scales. A first version of a global river reach classification was produced at the 500m resolution. Variables used in the classification have influence on processes involved at different scales (ex. topography index vs. pH). However, all variables are computed at the same high spatial resolution. This way, we can have a global look at local phenomenon.
NASA Technical Reports Server (NTRS)
Barclay, Rebecca O.; Pinelli, Thomas E.; Elazar, David; Kennedy, John M.
1991-01-01
As part of Phase 4 of the NASA/DoD Aerospace Knowledge Diffusion Research Project, two pilot studies were conducted that investigated the technical communications practices of Israeli and U.S. aerospace engineers and scientists. Both studies had the same five objectives: first, to solicit the opinions of aerospace engineers and scientists regarding the importance of technical communications to their profession; second, to determine the use and production of technical communications by aerospace engineers and scientists; third, to seek their view about the appropriate content of an undergraduate course in technical communications; fourth, to determine aerospace engineers' and scientists' use of libraries, technical information centers, and on-line databases; and fifth, to determine the use and importance of computer and information technology to them. A self-administered questionnaire was mailed to randomly selected U.S. aerospace engineers and scientists who are working in cryogenics, adaptive walls, and magnetic suspension. A slightly modified version was sent to Israeli aerospace engineers and scientists working at Israel Aircraft Industries, LTD. Responses of the Israeli and U.S. aerospace engineers and scientists to selected questions are presented in this paper.
NASA Astrophysics Data System (ADS)
Zimmerman Brachman, R.; Piazza, E.
2010-12-01
The Cassini Outreach Group for the Cassini mission to Saturn at NASA’s Jet Propulsion Laboratory runs an international essay contest called “Cassini Scientist for a Day.” Students write essays about Saturn and its rings and moons. The program has been run nine times, increasing in scope with each contest. Students in grades 5-12 gain skills in critical thinking, decision-making, researching, asking good questions, and communicating their ideas to scientists. Winners and their classes participate in teleconferencing question and answer sessions with Cassini scientists so students can ask questions to professional scientists. Videos of young Cassini scientists are included in the contest reference materials to provide role models for the students. Thousands of students in 27 countries on 6 continents have participated in the essay contest. Volunteers run the international contests outside of the United States, with their own rules, languages, and prizes.
NASA Astrophysics Data System (ADS)
Zimmerman Brachman, R.; Wessen, A.; Piazza, E.
2011-10-01
The outreach team for the Cassini mission to Saturn at NASA's Jet Propulsion Laboratory (JPL) runs an international essay contest called "Cassini Scientist for a Day." Students write essays about Saturn and its rings and moons. The program has been run nine times, increasing in scope with each contest. Students in grades 5 to 12 (ages 10 to 18) gain skills in critical thinking, decision-making, researching, asking good questions, and communicating their ideas to scientists. Winners and their classes participate in teleconferencing question-and-answer sessions with Cassini scientists so students can ask questions to professional scientists. Videos of young Cassini scientists are included in the contest reference materials to provide role models for the students. Thousands of students in 50 countries on 6 continents have participated in the essay contest. Volunteers run the international contests outside of the United States, with their own rules, languages, and prizes.
Science& Technology Review June 2003
DOE Office of Scientific and Technical Information (OSTI.GOV)
McMahon, D
This month's issue has the following articles: (1) Livermore's Three-Pronged Strategy for High-Performance Computing, Commentary by Dona Crawford; (2) Riding the Waves of Supercomputing Technology--Livermore's Computation Directorate is exploiting multiple technologies to ensure high-performance, cost-effective computing; (3) Chromosome 19 and Lawrence Livermore Form a Long-Lasting Bond--Lawrence Livermore biomedical scientists have played an important role in the Human Genome Project through their long-term research on chromosome 19; (4) A New Way to Measure the Mass of Stars--For the first time, scientists have determined the mass of a star in isolation from other celestial bodies; and (5) Flexibly Fueled Storage Tank Bringsmore » Hydrogen-Powered Cars Closer to Reality--Livermore's cryogenic hydrogen fuel storage tank for passenger cars of the future can accommodate three forms of hydrogen fuel separately or in combination.« less
Eight Issues for Learning Scientists about Education and the Economy
ERIC Educational Resources Information Center
Roschelle, Jeremy; Bakia, Marianne; Toyama, Yukie; Patton, Charles
2011-01-01
Linking research to a compelling societal interest can build financial commitments to research, bring increased attention to findings, and grow support for scaling up impacts. Among many compelling societal interests that learning scientists can cite--such as increasing the quality of life, preparing citizens to make decisions in a complex world,…
NASA Astrophysics Data System (ADS)
Osborne-Gowey, J.; Strittholt, J.; Bergquist, J.; Ward, B. C.; Sheehan, T.; Comendant, T.; Bachelet, D. M.
2009-12-01
The world’s aquatic resources are experiencing anthropogenic pressures on an unprecedented scale and aquatic organisms are experiencing widespread population changes and ecosystem-scale habitat alterations. Climate change is likely to exacerbate these threats, in some cases reducing the range of native North American fishes by 20-100% (depending on the location of the population and the model assumptions). Scientists around the globe are generating large volumes of data that vary in quality, format, supporting documentation, and accessibility. Moreover, diverse models are being run at various temporal and spatial scales as scientists attempt to understand previous (and project future) human impacts to aquatic species and their habitats. Conservation scientists often struggle to synthesize this wealth of information for developing practical on-the-ground management strategies. As a result, the best available science is often not utilized in the decision-making and adaptive management processes. As aquatic conservation problems around the globe become more serious and the demand to solve them grows more urgent, scientists and land-use managers need a new way to bring strategic, science-based, and action-oriented approaches to aquatic conservation. The Conservation Biology Institute (CBI), with partners such as ESRI, is developing an Aquatic Center as part of a dynamic, web-based resource (Data Basin; http: databasin.org) that centralizes usable aquatic datasets and provides analytical tools to visualize, analyze, and communicate findings for practical applications. To illustrate its utility, we present example datasets of varying spatial scales and synthesize multiple studies to arrive at novel solutions to aquatic threats.
Machine Learning to Discover and Optimize Materials
NASA Astrophysics Data System (ADS)
Rosenbrock, Conrad Waldhar
For centuries, scientists have dreamed of creating materials by design. Rather than discovery by accident, bespoke materials could be tailored to fulfill specific technological needs. Quantum theory and computational methods are essentially equal to the task, and computational power is the new bottleneck. Machine learning has the potential to solve that problem by approximating material behavior at multiple length scales. A full end-to-end solution must allow us to approximate the quantum mechanics, microstructure and engineering tasks well enough to be predictive in the real world. In this dissertation, I present algorithms and methodology to address some of these problems at various length scales. In the realm of enumeration, systems with many degrees of freedom such as high-entropy alloys may contain prohibitively many unique possibilities so that enumerating all of them would exhaust available compute memory. One possible way to address this problem is to know in advance how many possibilities there are so that the user can reduce their search space by restricting the occupation of certain lattice sites. Although tools to calculate this number were available, none performed well for very large systems and none could easily be integrated into low-level languages for use in existing scientific codes. I present an algorithm to solve these problems. Testing the robustness of machine-learned models is an essential component in any materials discovery or optimization application. While it is customary to perform a small number of system-specific tests to validate an approach, this may be insufficient in many cases. In particular, for Cluster Expansion models, the expansion may not converge quickly enough to be useful and reliable. Although the method has been used for decades, a rigorous investigation across many systems to determine when CE "breaks" was still lacking. This dissertation includes this investigation along with heuristics that use only a small training database to predict whether a model is worth pursuing in detail. To be useful, computational materials discovery must lead to experimental validation. However, experiments are difficult due to sample purity, environmental effects and a host of other considerations. In many cases, it is difficult to connect theory to experiment because computation is deterministic. By combining advanced group theory with machine learning, we created a new tool that bridges the gap between experiment and theory so that experimental and computed phase diagrams can be harmonized. Grain boundaries in real materials control many important material properties such as corrosion, thermal conductivity, and creep. Because of their high dimensionality, learning the underlying physics to optimizing grain boundaries is extremely complex. By leveraging a mathematically rigorous representation for local atomic environments, machine learning becomes a powerful tool to approximate properties for grain boundaries. But it also goes beyond predicting properties by highlighting those atomic environments that are most important for influencing the boundary properties. This provides an immense dimensionality reduction that empowers grain boundary scientists to know where to look for deeper physical insights.
How scientists view the public, the media and the political process.
Besley, John C; Nisbet, Matthew
2013-08-01
We review past studies on how scientists view the public, the goals of communication, the performance and impacts of the media, and the role of the public in policy decision-making. We add to these past findings by analyzing two recent large-scale surveys of scientists in the UK and US. These analyses show that scientists believe the public is uninformed about science and therefore prone to errors in judgment and policy preferences. Scientists are critical of media coverage generally, yet they also tend to rate favorably their own experience dealing with journalists, believing that such interactions are important both for promoting science literacy and for career advancement. Scientists believe strongly that they should have a role in public debates and view policy-makers as the most important group with which to engage. Few scientists view their role as an enabler of direct public participation in decision-making through formats such as deliberative meetings, and do not believe there are personal benefits for investing in these activities. Implications for future research are discussed, in particular the need to examine how ideology and selective information sources shape scientists' views.
Amplify scientific discovery with artificial intelligence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gil, Yolanda; Greaves, Mark T.; Hendler, James
Computing innovations have fundamentally changed many aspects of scientific inquiry. For example, advances in robotics, high-end computing, networking, and databases now underlie much of what we do in science such as gene sequencing, general number crunching, sharing information between scientists, and analyzing large amounts of data. As computing has evolved at a rapid pace, so too has its impact in science, with the most recent computing innovations repeatedly being brought to bear to facilitate new forms of inquiry. Recently, advances in Artificial Intelligence (AI) have deeply penetrated many consumer sectors, including for example Apple’s Siri™ speech recognition system, real-time automatedmore » language translation services, and a new generation of self-driving cars and self-navigating drones. However, AI has yet to achieve comparable levels of penetration in scientific inquiry, despite its tremendous potential in aiding computers to help scientists tackle tasks that require scientific reasoning. We contend that advances in AI will transform the practice of science as we are increasingly able to effectively and jointly harness human and machine intelligence in the pursuit of major scientific challenges.« less
Challenges for data storage in medical imaging research.
Langer, Steve G
2011-04-01
Researchers in medical imaging have multiple challenges for storing, indexing, maintaining viability, and sharing their data. Addressing all these concerns requires a constellation of tools, but not all of them need to be local to the site. In particular, the data storage challenges faced by researchers can begin to require professional information technology skills. With limited human resources and funds, the medical imaging researcher may be better served with an outsourcing strategy for some management aspects. This paper outlines an approach to manage the main objectives faced by medical imaging scientists whose work includes processing and data mining on non-standard file formats, and relating those files to the their DICOM standard descendents. The capacity of the approach scales as the researcher's need grows by leveraging the on-demand provisioning ability of cloud computing.
A review of causal inference for biomedical informatics
Kleinberg, Samantha; Hripcsak, George
2011-01-01
Causality is an important concept throughout the health sciences and is particularly vital for informatics work such as finding adverse drug events or risk factors for disease using electronic health records. While philosophers and scientists working for centuries on formalizing what makes something a cause have not reached a consensus, new methods for inference show that we can make progress in this area in many practical cases. This article reviews core concepts in understanding and identifying causality and then reviews current computational methods for inference and explanation, focusing on inference from large-scale observational data. While the problem is not fully solved, we show that graphical models and Granger causality provide useful frameworks for inference and that a more recent approach based on temporal logic addresses some of the limitations of these methods. PMID:21782035
The Computer Simulation of Liquids by Molecular Dynamics.
ERIC Educational Resources Information Center
Smith, W.
1987-01-01
Proposes a mathematical computer model for the behavior of liquids using the classical dynamic principles of Sir Isaac Newton and the molecular dynamics method invented by other scientists. Concludes that other applications will be successful using supercomputers to go beyond simple Newtonian physics. (CW)
Carbon Smackdown: Visualizing Clean Energy (LBNL Summer Lecture Series)
Meza, Juan [LBNL Computational Research Division
2017-12-09
The final Carbon Smackdown match took place Aug. 9, 2010. Juan Meza of the Computational Research Division revealed how scientists use computer visualizations to accelerate climate research and discuss the development of next-generation clean energy technologies such as wind turbines and solar cells.
Interfacing the Experimenter to the Computer: Languages for Psychologists
ERIC Educational Resources Information Center
Wood, Ronald W.; And Others
1975-01-01
An examination and comparison of the computer languages which behavioral scientists are most likely to use: SCAT, INTERACT, SKED, OS/8 Fortran IV, RT11/Fortran, RSX-11M, Data General's Real-Time; Disk Operating System and its Fortran, and interpretative Languages. (EH)
Programming Digital Stories and How-to Animations
ERIC Educational Resources Information Center
Hansen, Alexandria Killian; Iveland, Ashley; Harlow, Danielle Boyd; Dwyer, Hilary; Franklin, Diana
2015-01-01
As science teachers continue preparing for implementation of the "Next Generation Science Standards," one recommendation is to use computer programming as a promising context to efficiently integrate science and engineering. In this article, a interdisciplinary team of educational researchers and computer scientists describe how to use…
EASI: An electronic assistant for scientific investigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schur, A.; Feller, D.; DeVaney, M.
1991-09-01
Although many automated tools support the productivity of professionals (engineers, managers, architects, secretaries, etc.), none specifically address the needs of the scientific researcher. The scientist's needs are complex and the primary activities are cognitive rather than physical. The individual scientist collects and manipulates large data sets, integrates, synthesizes, generates, and records information. The means to access and manipulate information are a critical determinant of the performance of the system as a whole. One hindrance in this process is the scientist's computer environment, which has changed little in the last two decades. Extensive time and effort is demanded from the scientistmore » to learn to use the computer system. This paper describes how chemists' activities and interactions with information were abstracted into a common paradigm that meets the critical requirement of facilitating information access and retrieval. This paradigm was embodied in EASI, a working prototype that increased the productivity of the individual scientific researcher. 4 refs., 2 figs., 1 tab.« less
Educational NASA Computational and Scientific Studies (enCOMPASS)
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess
2013-01-01
Educational NASA Computational and Scientific Studies (enCOMPASS) is an educational project of NASA Goddard Space Flight Center aimed at bridging the gap between computational objectives and needs of NASA's scientific research, missions, and projects, and academia's latest advances in applied mathematics and computer science. enCOMPASS achieves this goal via bidirectional collaboration and communication between NASA and academia. Using developed NASA Computational Case Studies in university computer science/engineering and applied mathematics classes is a way of addressing NASA's goals of contributing to the Science, Technology, Education, and Math (STEM) National Objective. The enCOMPASS Web site at http://encompass.gsfc.nasa.gov provides additional information. There are currently nine enCOMPASS case studies developed in areas of earth sciences, planetary sciences, and astrophysics. Some of these case studies have been published in AIP and IEEE's Computing in Science and Engineering magazines. A few university professors have used enCOMPASS case studies in their computational classes and contributed their findings to NASA scientists. In these case studies, after introducing the science area, the specific problem, and related NASA missions, students are first asked to solve a known problem using NASA data and past approaches used and often published in a scientific/research paper. Then, after learning about the NASA application and related computational tools and approaches for solving the proposed problem, students are given a harder problem as a challenge for them to research and develop solutions for. This project provides a model for NASA scientists and engineers on one side, and university students, faculty, and researchers in computer science and applied mathematics on the other side, to learn from each other's areas of work, computational needs and solutions, and the latest advances in research and development. This innovation takes NASA science and engineering applications to computer science and applied mathematics university classes, and makes NASA objectives part of the university curricula. There is great potential for growth and return on investment of this program to the point where every major university in the U.S. would use at least one of these case studies in one of their computational courses, and where every NASA scientist and engineer facing a computational challenge (without having resources or expertise to solve it) would use enCOMPASS to formulate the problem as a case study, provide it to a university, and get back their solutions and ideas.
Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Potosnak, M. J.; Beck-Winchatz, B.; Ritter, P.
2016-12-01
High-altitude balloons (HABs) are an engaging platform for citizen science and formal and informal STEM education. However, the logistics of launching, chasing and recovering a payload on a 1200 g or 1500 g balloon can be daunting for many novice school groups and citizen scientists, and the cost can be prohibitive. In addition, there are many interesting scientific applications that do not require reaching the stratosphere, including measuring atmospheric pollutants in the planetary boundary layer. With a large number of citizen scientist flights, these data can be used to constrain satellite retrieval algorithms. In this poster presentation, we discuss a novel approach based on small (30 g) balloons that are cheap and easy to handle, and low-cost tracking devices (SPOT trackers for hikers) that do not require a radio license. Our scientific goal is to measure air quality in the lower troposphere. For example, particulate matter (PM) is an air pollutant that varies on small spatial scales and has sources in rural areas like biomass burning and farming practices such as tilling. Our HAB platform test flight incorporates an optical PM sensor, an integrated single board computer that records the PM sensor signal in addition to flight parameters (pressure, location and altitude), and a low-cost tracking system. Our goal is for the entire platform to cost less than $500. While the datasets generated by these flights are typically small, integrating a network of flight data from citizen scientists into a form usable for comparison to satellite data will require big data techniques.
Software Carpentry In The Hydrological Sciences
NASA Astrophysics Data System (ADS)
Ahmadia, A. J.; Kees, C. E.
2014-12-01
Scientists are spending an increasing amount of time building and using hydrology software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. As hydrology models increase in capability and enter use by a growing number of scientists and their communities, it is important that the scientific software development practices scale up to meet the challenges posed by increasing software complexity, lengthening software lifecycles, a growing number of stakeholders and contributers, and a broadened developer base that extends from application domains to high performance computing centers. Many of these challenges in complexity, lifecycles, and developer base have been successfully met by the open source community, and there are many lessons to be learned from their experiences and practices. Additionally, there is much wisdom to be found in the results of research studies conducted on software engineering itself. Software Carpentry aims to bridge the gap between the current state of software development and these known best practices for scientific software development, with a focus on hands-on exercises and practical advice. In 2014, Software Carpentry workshops targeting earth/environmental sciences and hydrological modeling have been organized and run at the Massachusetts Institute of Technology, the US Army Corps of Engineers, the Community Surface Dynamics Modeling System Annual Meeting, and the Earth Science Information Partners Summer Meeting. In this presentation, we will share some of the successes in teaching this material, as well as discuss and present instructional material specific to hydrological modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None, None
The Second SIAM Conference on Computational Science and Engineering was held in San Diego from February 10-12, 2003. Total conference attendance was 553. This is a 23% increase in attendance over the first conference. The focus of this conference was to draw attention to the tremendous range of major computational efforts on large problems in science and engineering, to promote the interdisciplinary culture required to meet these large-scale challenges, and to encourage the training of the next generation of computational scientists. Computational Science & Engineering (CS&E) is now widely accepted, along with theory and experiment, as a crucial third modemore » of scientific investigation and engineering design. Aerospace, automotive, biological, chemical, semiconductor, and other industrial sectors now rely on simulation for technical decision support. For federal agencies also, CS&E has become an essential support for decisions on resources, transportation, and defense. CS&E is, by nature, interdisciplinary. It grows out of physical applications and it depends on computer architecture, but at its heart are powerful numerical algorithms and sophisticated computer science techniques. From an applied mathematics perspective, much of CS&E has involved analysis, but the future surely includes optimization and design, especially in the presence of uncertainty. Another mathematical frontier is the assimilation of very large data sets through such techniques as adaptive multi-resolution, automated feature search, and low-dimensional parameterization. The themes of the 2003 conference included, but were not limited to: Advanced Discretization Methods; Computational Biology and Bioinformatics; Computational Chemistry and Chemical Engineering; Computational Earth and Atmospheric Sciences; Computational Electromagnetics; Computational Fluid Dynamics; Computational Medicine and Bioengineering; Computational Physics and Astrophysics; Computational Solid Mechanics and Materials; CS&E Education; Meshing and Adaptivity; Multiscale and Multiphysics Problems; Numerical Algorithms for CS&E; Discrete and Combinatorial Algorithms for CS&E; Inverse Problems; Optimal Design, Optimal Control, and Inverse Problems; Parallel and Distributed Computing; Problem-Solving Environments; Software and Wddleware Systems; Uncertainty Estimation and Sensitivity Analysis; and Visualization and Computer Graphics.« less
A History of the Liberal Arts Computer Science Consortium and Its Model Curricula
ERIC Educational Resources Information Center
Bruce, Kim B.; Cupper, Robert D.; Scot Drysdale, Robert L.
2010-01-01
With the support of a grant from the Sloan Foundation, nine computer scientists from liberal arts colleges came together in October, 1984 to form the Liberal Arts Computer Science Consortium (LACS) and to create a model curriculum appropriate for liberal arts colleges. Over the years the membership has grown and changed, but the focus has remained…
ERIC Educational Resources Information Center
Lesgold, Alan M., Ed.; Reif, Frederick, Ed.
The full proceedings are provided here of a conference of 40 teachers, educational researchers, and scientists from both the public and private sectors that centered on the future of computers in education and the research required to realize the computer's educational potential. A summary of the research issues considered and suggested means for…
ERIC Educational Resources Information Center
Carey, Cayelan C.; Gougis, Rebekka Darner
2017-01-01
Ecosystem modeling is a critically important tool for environmental scientists, yet is rarely taught in undergraduate and graduate classrooms. To address this gap, we developed a teaching module that exposes students to a suite of modeling skills and tools (including computer programming, numerical simulation modeling, and distributed computing)…
Lounnas, Valère; Wedler, Henry B; Newman, Timothy; Schaftenaar, Gijs; Harrison, Jason G; Nepomuceno, Gabriella; Pemberton, Ryan; Tantillo, Dean J; Vriend, Gert
2014-11-01
In molecular sciences, articles tend to revolve around 2D representations of 3D molecules, and sighted scientists often resort to 3D virtual reality software to study these molecules in detail. Blind and visually impaired (BVI) molecular scientists have access to a series of audio devices that can help them read the text in articles and work with computers. Reading articles published in this journal, though, is nearly impossible for them because they need to generate mental 3D images of molecules, but the article-reading software cannot do that for them. We have previously designed AsteriX, a web server that fully automatically decomposes articles, detects 2D plots of low molecular weight molecules, removes meta data and annotations from these plots, and converts them into 3D atomic coordinates. AsteriX-BVI goes one step further and converts the 3D representation into a 3D printable, haptic-enhanced format that includes Braille annotations. These Braille-annotated physical 3D models allow BVI scientists to generate a complete mental model of the molecule. AsteriX-BVI uses Molden to convert the meta data of quantum chemistry experiments into BVI friendly formats so that the entire line of scientific information that sighted people take for granted-from published articles, via printed results of computational chemistry experiments, to 3D models-is now available to BVI scientists too. The possibilities offered by AsteriX-BVI are illustrated by a project on the isomerization of a sterol, executed by the blind co-author of this article (HBW).
NASA Astrophysics Data System (ADS)
Lounnas, Valère; Wedler, Henry B.; Newman, Timothy; Schaftenaar, Gijs; Harrison, Jason G.; Nepomuceno, Gabriella; Pemberton, Ryan; Tantillo, Dean J.; Vriend, Gert
2014-11-01
In molecular sciences, articles tend to revolve around 2D representations of 3D molecules, and sighted scientists often resort to 3D virtual reality software to study these molecules in detail. Blind and visually impaired (BVI) molecular scientists have access to a series of audio devices that can help them read the text in articles and work with computers. Reading articles published in this journal, though, is nearly impossible for them because they need to generate mental 3D images of molecules, but the article-reading software cannot do that for them. We have previously designed AsteriX, a web server that fully automatically decomposes articles, detects 2D plots of low molecular weight molecules, removes meta data and annotations from these plots, and converts them into 3D atomic coordinates. AsteriX-BVI goes one step further and converts the 3D representation into a 3D printable, haptic-enhanced format that includes Braille annotations. These Braille-annotated physical 3D models allow BVI scientists to generate a complete mental model of the molecule. AsteriX-BVI uses Molden to convert the meta data of quantum chemistry experiments into BVI friendly formats so that the entire line of scientific information that sighted people take for granted—from published articles, via printed results of computational chemistry experiments, to 3D models—is now available to BVI scientists too. The possibilities offered by AsteriX-BVI are illustrated by a project on the isomerization of a sterol, executed by the blind co-author of this article (HBW).
The Man computer Interactive Data Access System: 25 Years of Interactive Processing.
NASA Astrophysics Data System (ADS)
Lazzara, Matthew A.; Benson, John M.; Fox, Robert J.; Laitsch, Denise J.; Rueden, Joseph P.; Santek, David A.; Wade, Delores M.; Whittaker, Thomas M.; Young, J. T.
1999-02-01
On 12 October 1998, it was the 25th anniversary of the Man computer Interactive Data Access System (McIDAS). On that date in 1973, McIDAS was first used operationally by scientists as a tool for data analysis. Over the last 25 years, McIDAS has undergone numerous architectural changes in an effort to keep pace with changing technology. In its early years, significant technological breakthroughs were required to achieve the functionality needed by atmospheric scientists. Today McIDAS is challenged by new Internet-based approaches to data access and data display. The history and impact of McIDAS, along with some of the lessons learned, are presented here
Applications of genetic programming in cancer research.
Worzel, William P; Yu, Jianjun; Almal, Arpit A; Chinnaiyan, Arul M
2009-02-01
The theory of Darwinian evolution is the fundamental keystones of modern biology. Late in the last century, computer scientists began adapting its principles, in particular natural selection, to complex computational challenges, leading to the emergence of evolutionary algorithms. The conceptual model of selective pressure and recombination in evolutionary algorithms allow scientists to efficiently search high dimensional space for solutions to complex problems. In the last decade, genetic programming has been developed and extensively applied for analysis of molecular data to classify cancer subtypes and characterize the mechanisms of cancer pathogenesis and development. This article reviews current successes using genetic programming and discusses its potential impact in cancer research and treatment in the near future.
Crowd-Sourcing with K-12 citizen scientists: The Continuing Evolution of the GLOBE Program
NASA Astrophysics Data System (ADS)
Murphy, T.; Wegner, K.; Andersen, T. J.
2016-12-01
Twenty years ago, the Internet was still in its infancy, citizen science was a relatively unknown term, and the idea of a global citizen science database was unheard of. Then the Global Learning and Observations to Benefit the Environment (GLOBE) Program was proposed and this all changed. GLOBE was one of the first K-12 citizen science programs on a global scale. An initial large scale ramp-up of the program was followed by the establishment of a network of partners in countries and within the U.S. Now in the 21st century, the program has over 50 protocols in atmosphere, biosphere, hydrosphere and pedosphere, almost 140 million measurements in the database, a visualization system, collaborations with NASA satellite mission scientists (GPM, SMAP) and other scientists, as well as research projects by GLOBE students. As technology changed over the past two decades, it was integrated into the program's outreach efforts to existing and new members with the result that the program now has a strong social media presence. In 2016, a new app was launched which opened up GLOBE and data entry to citizen scientists of all ages. The app is aimed at fresh audiences, beyond the traditional GLOBE K-12 community. Groups targeted included: scouting organizations, museums, 4H, science learning centers, retirement communities, etc. to broaden participation in the program and increase the number of data available to students and scientists. Through the 20 years of GLOBE, lessons have been learned about changing the management of this type of large-scale program, the use of technology to enhance and improve the experience for members, and increasing community involvement in the program.
Flexible workflow sharing and execution services for e-scientists
NASA Astrophysics Data System (ADS)
Kacsuk, Péter; Terstyanszky, Gábor; Kiss, Tamas; Sipos, Gergely
2013-04-01
The sequence of computational and data manipulation steps required to perform a specific scientific analysis is called a workflow. Workflows that orchestrate data and/or compute intensive applications on Distributed Computing Infrastructures (DCIs) recently became standard tools in e-science. At the same time the broad and fragmented landscape of workflows and DCIs slows down the uptake of workflow-based work. The development, sharing, integration and execution of workflows is still a challenge for many scientists. The FP7 "Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs" (SHIWA) project significantly improved the situation, with a simulation platform that connects different workflow systems, different workflow languages, different DCIs and workflows into a single, interoperable unit. The SHIWA Simulation Platform is a service package, already used by various scientific communities, and used as a tool by the recently started ER-flow FP7 project to expand the use of workflows among European scientists. The presentation will introduce the SHIWA Simulation Platform and the services that ER-flow provides based on the platform to space and earth science researchers. The SHIWA Simulation Platform includes: 1. SHIWA Repository: A database where workflows and meta-data about workflows can be stored. The database is a central repository to discover and share workflows within and among communities . 2. SHIWA Portal: A web portal that is integrated with the SHIWA Repository and includes a workflow executor engine that can orchestrate various types of workflows on various grid and cloud platforms. 3. SHIWA Desktop: A desktop environment that provides similar access capabilities than the SHIWA Portal, however it runs on the users' desktops/laptops instead of a portal server. 4. Workflow engines: the ASKALON, Galaxy, GWES, Kepler, LONI Pipeline, MOTEUR, Pegasus, P-GRADE, ProActive, Triana, Taverna and WS-PGRADE workflow engines are already integrated with the execution engine of the SHIWA Portal. Other engines can be added when required. Through the SHIWA Portal one can define and run simulations on the SHIWA Virtual Organisation, an e-infrastructure that gathers computing and data resources from various DCIs, including the European Grid Infrastructure. The Portal via third party workflow engines provides support for the most widely used academic workflow engines and it can be extended with other engines on demand. Such extensions translate between workflow languages and facilitate the nesting of workflows into larger workflows even when those are written in different languages and require different interpreters for execution. Through the workflow repository and the portal lonely scientists and scientific collaborations can share and offer workflows for reuse and execution. Given the integrated nature of the SHIWA Simulation Platform the shared workflows can be executed online, without installing any special client environment and downloading workflows. The FP7 "Building a European Research Community through Interoperable Workflows and Data" (ER-flow) project disseminates the achievements of the SHIWA project and use these achievements to build workflow user communities across Europe. ER-flow provides application supports to research communities within and beyond the project consortium to develop, share and run workflows with the SHIWA Simulation Platform.
NASA Astrophysics Data System (ADS)
Gregory, A. E.; Benedict, K. K.; Zhang, S.; Savickas, J.
2017-12-01
Large scale, high severity wildfires in forests have become increasingly prevalent in the western United States due to fire exclusion. Although past work has focused on the immediate consequences of wildfire (ie. runoff magnitude and debris flow), little has been done to understand the post wildfire hydrologic consequences of vegetation regrowth. Furthermore, vegetation is often characterized by static parameterizations within hydrological models. In order to understand the temporal relationship between hydrologic processes and revegetation, we modularized and partially automated the hydrologic modeling process to increase connectivity between remotely sensed data, the Virtual Watershed Platform (a data management resource, called the VWP), input meteorological data, and the Precipitation-Runoff Modeling System (PRMS). This process was used to run simulations in the Valles Caldera of NM, an area impacted by the 2011 Las Conchas Fire, in PRMS before and after the Las Conchas to evaluate hydrologic process changes. The modeling environment addressed some of the existing challenges faced by hydrological modelers. At present, modelers are somewhat limited in their ability to push the boundaries of hydrologic understanding. Specific issues faced by modelers include limited computational resources to model processes at large spatial and temporal scales, data storage capacity and accessibility from the modeling platform, computational and time contraints for experimental modeling, and the skills to integrate modeling software in ways that have not been explored. By taking an interdisciplinary approach, we were able to address some of these challenges by leveraging the skills of hydrologic, data, and computer scientists; and the technical capabilities provided by a combination of on-demand/high-performance computing, distributed data, and cloud services. The hydrologic modeling process was modularized to include options for distributing meteorological data, parameter space experimentation, data format transformation, looping, validation of models and containerization for enabling new analytic scenarios. The user interacts with the modules through Jupyter Notebooks which can be connected to an on-demand computing and HPC environment, and data services built as part of the VWP.
ERIC Educational Resources Information Center
Chen, Alice Y.; McKee, Nancy
1999-01-01
Describes the developmental process used to visualize the calcium ATPase enzyme of the sarcoplasmic reticulum which involves evaluating scientific information, consulting scientists, model making, storyboarding, and creating and editing in a computer medium. (Author/CCM)
-performance Computing Grid Computing Networking Mass Storage Plan for the Future State of the Laboratory to help decipher the language of high-energy physics. Virtual Ask-a-Scientist Read transcripts from past online chat sessions. last modified 1/04/2005 email Fermilab Fermi National Accelerator Laboratory
ERIC Educational Resources Information Center
Reed, Cameron
2016-01-01
How can old-fashioned tables of logarithms be computed without technology? Today, of course, no practicing mathematician, scientist, or engineer would actually use logarithms to carry out a calculation, let alone worry about deriving them from scratch. But high school students may be curious about the process. This article develops a…
Flexible server-side processing of climate archives
NASA Astrophysics Data System (ADS)
Juckes, Martin; Stephens, Ag; Damasio da Costa, Eduardo
2014-05-01
The flexibility and interoperability of OGC Web Processing Services are combined with an extensive range of data processing operations supported by the Climate Data Operators (CDO) library to facilitate processing of the CMIP5 climate data archive. The challenges posed by this peta-scale archive allow us to test and develop systems which will help us to deal with approaching exa-scale challenges. The CEDA WPS package allows users to manipulate data in the archive and export the results without first downloading the data -- in some cases this can drastically reduce the data volumes which need to be transferred and greatly reduce the time needed for the scientists to get their results. Reductions in data transfer are achieved at the expense of an additional computational load imposed on the archive (or near-archive) infrastructure. This is managed with a load balancing system. Short jobs may be run in near real-time, longer jobs will be queued. When jobs are queued the user is provided with a web dashboard displaying job status. A clean split between the data manipulation software and the request management software is achieved by exploiting the extensive CDO library. This library has a long history of development to support the needs of the climate science community. Use of the library ensures that operations run on data by the system can be reproduced by users using the same operators installed on their own computers. Examples using the system deployed for the CMIP5 archive will be shown and issues which need to be addressed as archive volumes expand into the exa-scale will be discussed.
Flexible server-side processing of climate archives
NASA Astrophysics Data System (ADS)
Juckes, M. N.; Stephens, A.; da Costa, E. D.
2013-12-01
The flexibility and interoperability of OGC Web Processing Services are combined with an extensive range of data processing operations supported by the Climate Data Operators (CDO) library to facilitate processing of the CMIP5 climate data archive. The challenges posed by this peta-scale archive allow us to test and develop systems which will help us to deal with approaching exa-scale challenges. The CEDA WPS package allows users to manipulate data in the archive and export the results without first downloading the data -- in some cases this can drastically reduce the data volumes which need to be transferred and greatly reduce the time needed for the scientists to get their results. Reductions in data transfer are achieved at the expense of an additional computational load imposed on the archive (or near-archive) infrastructure. This is managed with a load balancing system. Short jobs may be run in near real-time, longer jobs will be queued. When jobs are queued the user is provided with a web dashboard displaying job status. A clean split between the data manipulation software and the request management software is achieved by exploiting the extensive CDO library. This library has a long history of development to support the needs of the climate science community. Use of the library ensures that operations run on data by the system can be reproduced by users using the same operators installed on their own computers. Examples using the system deployed for the CMIP5 archive will be shown and issues which need to be addressed as archive volumes expand into the exa-scale will be discussed.
Turkish Adaptation of Questionnaire on Attitudes towards Engineers and Scientists
ERIC Educational Resources Information Center
Ergün, Aysegül; Balçin, Muhammed Dogukan
2017-01-01
The aim of this research was to present the Turkish adaptation of the survey for Middle-School Students' Attitudes toward Engineers and Scientists prepared by Lyons, Fralick and Kearn (2009) 32 items in a 5-point Likert type scale. The questionnaire was administered to 707 students receiving education in the fifth, sixth, seventh and eighth grades…
Participatory Design of Human-Centered Cyberinfrastructure (Invited)
NASA Astrophysics Data System (ADS)
Pennington, D. D.; Gates, A. Q.
2010-12-01
Cyberinfrastructure, by definition, is about people sharing resources to achieve outcomes that cannot be reached independently. CI depends not just on creating discoverable resources, or tools that allow those resources to be processed, integrated, and visualized -- but on human activation of flows of information across those resources. CI must be centered on human activities. Yet for those CI projects that are directed towards observational science, there are few models for organizing collaborative research in ways that align individual research interests into a collective vision of CI-enabled science. Given that the emerging technologies are themselves expected to change the way science is conducted, it is not simply a matter of conducting requirements analysis on how scientists currently work, or building consensus among the scientists on what is needed. Developing effective CI depends on generating a new, creative vision of problem solving within a community based on computational concepts that are, in some cases, still very abstract and theoretical. The computer science theory may (or may not) be well formalized, but the potential for impact on any particular domain is typically ill-defined. In this presentation we will describe approaches being developed and tested at the CyberShARE Center of Excellence at University of Texas in El Paso for ill-structured problem solving within cross-disciplinary teams of scientists and computer scientists working on data intensive environmental and geoscience. These approaches deal with the challenges associated with sharing and integrating knowledge across disciplines; the challenges of developing effective teamwork skills in a culture that favors independent effort; and the challenges of evolving shared, focused research goals from ill-structured, vague starting points - all issues that must be confronted by every interdisciplinary CI project. We will introduce visual and semantic-based tools that can enable the collaborative research design process and illustrate their application in designing and developing useful end-to-end data solutions for scientists. Lastly, we will outline areas of future investigation within CyberShARE that we believe have the potential for high impact.
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; ...
2018-05-05
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
Grid-Enabled High Energy Physics Research using a Beowulf Cluster
NASA Astrophysics Data System (ADS)
Mahmood, Akhtar
2005-04-01
At Edinboro University of Pennsylvania, we have built a 8-node 25 Gflops Beowulf Cluster with 2.5 TB of disk storage space to carry out grid-enabled, data-intensive high energy physics research for the ATLAS experiment via Grid3. We will describe how we built and configured our Cluster, which we have named the Sphinx Beowulf Cluster. We will describe the results of our cluster benchmark studies and the run-time plots of several parallel application codes. Once fully functional, the Cluster will be part of Grid3[www.ivdgl.org/grid3]. The current ATLAS simulation grid application, models the entire physical processes from the proton anti-proton collisions and detector's response to the collision debri through the complete reconstruction of the event from analyses of these responses. The end result is a detailed set of data that simulates the real physical collision event inside a particle detector. Grid is the new IT infrastructure for the 21^st century science -- a new computing paradigm that is poised to transform the practice of large-scale data-intensive research in science and engineering. The Grid will allow scientist worldwide to view and analyze huge amounts of data flowing from the large-scale experiments in High Energy Physics. The Grid is expected to bring together geographically and organizationally dispersed computational resources, such as CPUs, storage systems, communication systems, and data sources.
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
The Virtual Watershed Observatory: Cyberinfrastructure for Model-Data Integration and Access
NASA Astrophysics Data System (ADS)
Duffy, C.; Leonard, L. N.; Giles, L.; Bhatt, G.; Yu, X.
2011-12-01
The Virtual Watershed Observatory (VWO) is a concept where scientists, water managers, educators and the general public can create a virtual observatory from integrated hydrologic model results, national databases and historical or real-time observations via web services. In this paper, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover. The VWO has the broad purpose of making accessible water resource simulations, real-time data assimilation, calibration and archival at the scale of HUC 12 watersheds (Hydrologic Unit Code) anywhere in the continental US. Our prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed simulation. The paper will describe cyberinfrastructure tools and workflow that attempts to resolve the problem of model-data accessibility and scalability such that individuals, research teams, managers and educators can create a WVO in a desired context. Examples are given for the NSF-funded Shale Hills Critical Zone Observatory and the European Critical Zone Observatories within the SoilTrEC project. In the future implementation of WVO services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.
Open Research Challenges with Big Data - A Data-Scientist s Perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R
In this paper, we discuss data-driven discovery challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are data mining algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emergingmore » and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security, healthcare and manufacturing to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less
Computer measurement of particle sizes in electron microscope images
NASA Technical Reports Server (NTRS)
Hall, E. L.; Thompson, W. B.; Varsi, G.; Gauldin, R.
1976-01-01
Computer image processing techniques have been applied to particle counting and sizing in electron microscope images. Distributions of particle sizes were computed for several images and compared to manually computed distributions. The results of these experiments indicate that automatic particle counting within a reasonable error and computer processing time is feasible. The significance of the results is that the tedious task of manually counting a large number of particles can be eliminated while still providing the scientist with accurate results.
Smith, Rob; Mathis, Andrew D; Ventura, Dan; Prince, John T
2014-01-01
For decades, mass spectrometry data has been analyzed to investigate a wide array of research interests, including disease diagnostics, biological and chemical theory, genomics, and drug development. Progress towards solving any of these disparate problems depends upon overcoming the common challenge of interpreting the large data sets generated. Despite interim successes, many data interpretation problems in mass spectrometry are still challenging. Further, though these challenges are inherently interdisciplinary in nature, the significant domain-specific knowledge gap between disciplines makes interdisciplinary contributions difficult. This paper provides an introduction to the burgeoning field of computational mass spectrometry. We illustrate key concepts, vocabulary, and open problems in MS-omics, as well as provide invaluable resources such as open data sets and key search terms and references. This paper will facilitate contributions from mathematicians, computer scientists, and statisticians to MS-omics that will fundamentally improve results over existing approaches and inform novel algorithmic solutions to open problems.
Nature apps: Waiting for the revolution.
Jepson, Paul; Ladle, Richard J
2015-12-01
Apps are small task-orientated programs with the potential to integrate the computational and sensing capacities of smartphones with the power of cloud computing, social networking, and crowdsourcing. They have the potential to transform how humans interact with nature, cause a step change in the quantity and resolution of biodiversity data, democratize access to environmental knowledge, and reinvigorate ways of enjoying nature. To assess the extent to which this potential is being exploited in relation to nature, we conducted an automated search of the Google Play Store using 96 nature-related terms. This returned data on ~36 304 apps, of which ~6301 were nature-themed. We found that few of these fully exploit the full range of capabilities inherent in the technology and/or have successfully captured the public imagination. Such breakthroughs will only be achieved by increasing the frequency and quality of collaboration between environmental scientists, information engineers, computer scientists, and interested publics.
Developing Higher-Order Materials Knowledge Systems
NASA Astrophysics Data System (ADS)
Fast, Anthony Nathan
2011-12-01
Advances in computational materials science and novel characterization techniques have allowed scientists to probe deeply into a diverse range of materials phenomena. These activities are producing enormous amounts of information regarding the roles of various hierarchical material features in the overall performance characteristics displayed by the material. Connecting the hierarchical information over disparate domains is at the crux of multiscale modeling. The inherent challenge of performing multiscale simulations is developing scale bridging relationships to couple material information between well separated length scales. Much progress has been made in the development of homogenization relationships which replace heterogeneous material features with effective homogenous descriptions. These relationships facilitate the flow of information from lower length scales to higher length scales. Meanwhile, most localization relationships that link the information from a from a higher length scale to a lower length scale are plagued by computationally intensive techniques which are not readily integrated into multiscale simulations. The challenge of executing fully coupled multiscale simulations is augmented by the need to incorporate the evolution of the material structure that may occur under conditions such as material processing. To address these challenges with multiscale simulation, a novel framework called the Materials Knowledge System (MKS) has been developed. This methodology efficiently extracts, stores, and recalls microstructure-property-processing localization relationships. This approach is built on the statistical continuum theories developed by Kroner that express the localization of the response field at the microscale using a series of highly complex convolution integrals, which have historically been evaluated analytically. The MKS approach dramatically improves the accuracy of these expressions by calibrating the convolution kernels in these expressions to results from previously validated physics-based models. These novel tools have been validated for the elastic strain localization in moderate contrast dual-phase composites by direct comparisons with predictions from finite element model. The versatility of the approach is further demonstrated by its successful application to capturing the structure evolution during spinodal decomposition of a binary alloy. Lastly, some key features in the future application of the MKS approach are developed using the Portevin-le Chaterlier effect. It has been shown with these case studies that the MKS approach is capable of accurately reproducing the results from physics based models with a drastic reduction in computational requirements.
The APECS Virtual Poster Session: a virtual platform for science communication and discussion
NASA Astrophysics Data System (ADS)
Renner, A.; Jochum, K.; Jullion, L.; Pavlov, A.; Liggett, D.; Fugmann, G.; Baeseman, J. L.; Apecs Virtual Poster Session Working Group, T.
2011-12-01
The Virtual Poster Session (VPS) of the Association of Polar Early Career Scientists (APECS) was developed by early career scientists as an online tool for communicating and discussing science and research beyond the four walls of a conference venue. Poster sessions often are the backbone of a conference where especially early career scientists get a chance to communicate their research, discuss ideas, data, and scientific problems with their peers and senior scientists. There, they can hone their 'elevator pitch', discussion skills and presentation skills. APECS has taken the poster session one step further and created the VPS - the same idea but independent from conferences, travel, and location. All that is needed is a computer with internet access. Instead of letting their posters collect dust on the computer's hard drive, scientists can now upload them to the APECS website. There, others have the continuous opportunity to comment, give feedback and discuss the work. Currently, about 200 posters are accessible contributed by authors and co-authors from 34 countries. Since January 2010, researchers can discuss their poster with a broad international audience including fellow researchers, community members, potential colleagues and collaborators, policy makers and educators during monthly conference calls via an internet platform. Recordings of the calls are available online afterwards. Calls so far have included topical sessions on e.g. marine biology, glaciology, or social sciences, and interdisciplinary calls on Arctic sciences or polar research activities in a specific country, e.g. India or Romania. They attracted audiences of scientists at all career stages and from all continents, with on average about 15 persons participating per call. Online tools like the VPS open up new ways for creating collaborations and new research ideas and sharing different methodologies for future projects, pushing aside the boundaries of countries and nations, conferences, offices, and disciplines, and provide early career scientists with easily accessible training opportunities for their communication and outreach skills, independent of their location and funding situation.
Achieving Operational Adaptability: Capacity Building Needs to Become a Warfighting Function
2010-04-26
platypus effect as described by David Green in The Serendipity Machine: A Voyage of Discovery Through the Unexpected World of Computers. Early in...the 18th century, the discovery of the platypus challenged the categories of animal life recognized and utilized by scientists in Europe. Scientists...resisted changing their categories for years. At first, they believed the platypus was a fabrication. Later, they resisted change since they were
The dynamics of Brazilian protozoology over the past century.
Elias, M Carolina; Floeter-Winter, Lucile M; Mena-Chalco, Jesus P
2016-01-01
Brazilian scientists have been contributing to the protozoology field for more than 100 years with important discoveries of new species such as Trypanosoma cruzi and Leishmania spp. In this work, we used a Brazilian thesis database (Coordination for the Improvement of Higher Education Personnel) covering the period from 1987-2011 to identify researchers who contributed substantially to protozoology. We selected 248 advisors by filtering to obtain researchers who supervised at least 10 theses. Based on a computational analysis of the thesis databases, we found students who were supervised by these scientists. A computational procedure was developed to determine the advisors' scientific ancestors using the Lattes Platform. These analyses provided a list of 1,997 researchers who were inspected through Lattes CV examination and allowed the identification of the pioneers of Brazilian protozoology. Moreover, we investigated the areas in which researchers who earned PhDs in protozoology are now working. We found that 68.4% of them are still in protozoology, while 16.7% have migrated to other fields. We observed that support for protozoology by national or international agencies is clearly correlated with the increase of scientists in the field. Finally, we described the academic genealogy of Brazilian protozoology by formalising the "forest" of Brazilian scientists involved in the study of protozoa and their vectors over the past century.
The dynamics of Brazilian protozoology over the past century
Elias, M Carolina; Floeter-Winter, Lucile M; Mena-Chalco, Jesus P
2016-01-01
Brazilian scientists have been contributing to the protozoology field for more than 100 years with important discoveries of new species such asTrypanosoma cruzi and Leishmania spp. In this work, we used a Brazilian thesis database (Coordination for the Improvement of Higher Education Personnel) covering the period from 1987-2011 to identify researchers who contributed substantially to protozoology. We selected 248 advisors by filtering to obtain researchers who supervised at least 10 theses. Based on a computational analysis of the thesis databases, we found students who were supervised by these scientists. A computational procedure was developed to determine the advisors’ scientific ancestors using the Lattes Platform. These analyses provided a list of 1,997 researchers who were inspected through Lattes CV examination and allowed the identification of the pioneers of Brazilian protozoology. Moreover, we investigated the areas in which researchers who earned PhDs in protozoology are now working. We found that 68.4% of them are still in protozoology, while 16.7% have migrated to other fields. We observed that support for protozoology by national or international agencies is clearly correlated with the increase of scientists in the field. Finally, we described the academic genealogy of Brazilian protozoology by formalising the “forest” of Brazilian scientists involved in the study of protozoa and their vectors over the past century. PMID:26814646
Data Prospecting Framework - a new approach to explore "big data" in Earth Science
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Rushing, J.; Lin, A.; Kuo, K.
2012-12-01
Due to advances in sensors, computation and storage, cost and effort required to produce large datasets have been significantly reduced. As a result, we are seeing a proliferation of large-scale data sets being assembled in almost every science field, especially in geosciences. Opportunities to exploit the "big data" are enormous as new hypotheses can be generated by combining and analyzing large amounts of data. However, such a data-driven approach to science discovery assumes that scientists can find and isolate relevant subsets from vast amounts of available data. Current Earth Science data systems only provide data discovery through simple metadata and keyword-based searches and are not designed to support data exploration capabilities based on the actual content. Consequently, scientists often find themselves downloading large volumes of data, struggling with large amounts of storage and learning new analysis technologies that will help them separate the wheat from the chaff. New mechanisms of data exploration are needed to help scientists discover the relevant subsets We present data prospecting, a new content-based data analysis paradigm to support data-intensive science. Data prospecting allows the researchers to explore big data in determining and isolating data subsets for further analysis. This is akin to geo-prospecting in which mineral sites of interest are determined over the landscape through screening methods. The resulting "data prospects" only provide an interaction with and feel for the data through first-look analytics; the researchers would still have to download the relevant datasets and analyze them deeply using their favorite analytical tools to determine if the datasets will yield new hypotheses. Data prospecting combines two traditional categories of data analysis, data exploration and data mining within the discovery step. Data exploration utilizes manual/interactive methods for data analysis such as standard statistical analysis and visualization, usually on small datasets. On the other hand, data mining utilizes automated algorithms to extract useful information. Humans guide these automated algorithms and specify algorithm parameters (training samples, clustering size, etc.). Data Prospecting combines these two approaches using high performance computing and the new techniques for efficient distributed file access.
Human Exploration Ethnography of the Haughton-Mars Project, 1998-1999
NASA Technical Reports Server (NTRS)
Clancey, William J.; Swanson, Keith (Technical Monitor)
1999-01-01
During the past two field seasons, July 1988 and 1999, we have conducted research about the field practices of scientists and engineers at Haughton Crater on Devon Island in the Canadian Arctic, with the objective of determining how people will live and work on Mars. This broad investigation of field life and work practice, part of the Haughton-Mars Project lead by Pascal Lee, spans social and cognitive anthropology, psychology, and computer science. Our approach involves systematic observation and description of activities, places, and concepts, constituting an ethnography of field science at Haughton. Our focus is on human behaviors-what people do, where, when, with whom, and why. By locating behavior in time and place-in contrast with a purely functional or "task oriented" description of work-we find patterns constituting the choreography of interaction between people, their habitat, and their tools. As such, we view the exploration process in terms of a total system comprising a social organization, facilities, terrain/climate, personal identities, artifacts, and computer tools. Because we are computer scientists seeking to develop new kinds of tools for living and working on Mars, we focus on the existing representational tools (such as documents and measuring devices), learning and improvization (such as use of the internet or informal assistance), and prototype computational systems brought to the field. Our research is based on partnership, by which field scientists and engineers actively contribute to our findings, just as we participate in their work and life.
Dynamic Collaboration Infrastructure for Hydrologic Science
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.
2016-12-01
Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.
NASA Astrophysics Data System (ADS)
Smuga-Otto, M. J.; Garcia, R. K.; Knuteson, R. O.; Martin, G. D.; Flynn, B. M.; Hackel, D.
2006-12-01
The University of Wisconsin-Madison Space Science and Engineering Center (UW-SSEC) is developing tools to help scientists realize the potential of high spectral resolution instruments for atmospheric science. Upcoming satellite spectrometers like the Cross-track Infrared Sounder (CrIS), experimental instruments like the Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) and proposed instruments like the Hyperspectral Environmental Suite (HES) within the GOES-R project will present a challenge in the form of the overwhelmingly large amounts of continuously generated data. Current and near-future workstations will have neither the storage space nor computational capacity to cope with raw spectral data spanning more than a few minutes of observations from these instruments. Schemes exist for processing raw data from hyperspectral instruments currently in testing, that involve distributed computation across clusters. Data, which for an instrument like GIFTS can amount to over 1.5 Terabytes per day, is carefully managed on Storage Area Networks (SANs), with attention paid to proper maintenance of associated metadata. The UW-SSEC is preparing a demonstration integrating these back-end capabilities as part of a larger visualization framework, to assist scientists in developing new products from high spectral data, sourcing data volumes they could not otherwise manage. This demonstration focuses on managing storage so that only the data specifically needed for the desired product are pulled from the SAN, and on running computationally expensive intermediate processing on a back-end cluster, with the final product being sent to a visualization system on the scientist's workstation. Where possible, existing software and solutions are used to reduce cost of development. The heart of the computing component is the GIFTS Information Processing System (GIPS), developed at the UW- SSEC to allow distribution of processing tasks such as conversion of raw GIFTS interferograms into calibrated radiance spectra, and retrieving temperature and water vapor content atmospheric profiles from these spectra. The hope is that by demonstrating the capabilities afforded by a composite system like the one described here, scientists can be convinced to contribute further algorithms in support of this model of computing and visualization.
NASA Astrophysics Data System (ADS)
Vincent, E. M.; Matlock, T.; Westerling, A. L.
2015-12-01
While most scientists recognize climate change as a major societal and environmental issue, social and political will to tackle the problem is still lacking. One of the biggest obstacles is inaccurate reporting or even outright misinformation in climate change coverage that result in the confusion of the general public on the issue.In today's era of instant access to information, what we read online usually falls outside our field of expertise and it is a real challenge to evaluate what is credible. The emerging technology of web annotation could be a game changer as it allows knowledgeable individuals to attach notes to any piece of text of a webpage and to share them with readers who will be able to see the annotations in-context -like comments on a pdf.Here we present the Climate Feedback initiative that is bringing together a community of climate scientists who collectively evaluate the scientific accuracy of influential climate change media coverage. Scientists annotate articles sentence by sentence and assess whether they are consistent with scientific knowledge allowing readers to see where and why the coverage is -or is not- based on science. Scientists also summarize the essence of their critical commentary in the form of a simple article-level overall credibility rating that quickly informs readers about the credibility of the entire piece.Web-annotation allows readers to 'hear' directly from the experts and to sense the consensus in a personal way as one can literaly see how many scientists agree with a given statement. It also allows a broad population of scientists to interact with the media, notably early career scientists.In this talk, we will present results on the impacts annotations have on readers -regarding their evaluation of the trustworthiness of the information they read- and on journalists -regarding their reception of scientists comments.Several dozen scientists have contributed to this effort to date and the system offers potential to scale up as it relies on a crowdsourced process where each scientist only makes small contributions that get aggregated together. The project aims to build a network of scientists with varied expertise and to organize their efforts at a global scale to efficiently peer-review major news coverage on climate.
USDA-ARS?s Scientific Manuscript database
Next Generation Sequencing is transforming the way scientists collect and measure an organism’s genetic background and gene dynamics, while bioinformatics and super-computing are merging to facilitate parallel sample computation and interpretation at unprecedented speeds. Analyzing the complete gene...
ERIC Educational Resources Information Center
Dillenbourg, Pierre, Ed.
Intended to illustrate the benefits of collaboration between scientists from psychology and computer science, namely machine learning, this book contains the following chapters, most of which are co-authored by scholars from both sides: (1) "Introduction: What Do You Mean by 'Collaborative Learning'?" (Pierre Dillenbourg); (2)…
On October 25 and 26, 1984, the U.S. EPA sponsored a workshop to consider the potential applications of the techniques of computational biological chemistry to problems in environmental health. Eleven extramural scientists from the various related disciplines and a similar number...
Debugging Geographers: Teaching Programming to Non-Computer Scientists
ERIC Educational Resources Information Center
Muller, Catherine L.; Kidd, Chris
2014-01-01
The steep learning curve associated with computer programming can be a daunting prospect, particularly for those not well aligned with this way of logical thinking. However, programming is a skill that is becoming increasingly important. Geography graduates entering careers in atmospheric science are one example of a particularly diverse group who…
Using Computers for Research into Social Relations.
ERIC Educational Resources Information Center
Holden, George W.
1988-01-01
Discusses computer-presented social situations (CPSS), i.e., microcomputer-based simulations developed to provide a new methodological tool for social scientists interested in the study of social relations. Two CPSSs are described: DaySim, used to help identify types of parenting; and DateSim, used to study interpersonal attraction. (21…
Brains--Computers--Machines: Neural Engineering in Science Classrooms
ERIC Educational Resources Information Center
Chudler, Eric H.; Bergsman, Kristen Clapper
2016-01-01
Neural engineering is an emerging field of high relevance to students, teachers, and the general public. This feature presents online resources that educators and scientists can use to introduce students to neural engineering and to integrate core ideas from the life sciences, physical sciences, social sciences, computer science, and engineering…
Computer Science Professionals and Greek Library Science
ERIC Educational Resources Information Center
Dendrinos, Markos N.
2008-01-01
This paper attempts to present the current state of computer science penetration into librarianship in terms of both workplace and education issues. The shift from material libraries into digital libraries is mirrored in the corresponding shift from librarians into information scientists. New library data and metadata, as well as new automated…
Describing the What and Why of Students' Difficulties in Boolean Logic
ERIC Educational Resources Information Center
Herman, Geoffrey L.; Loui, Michael C.; Kaczmarczyk, Lisa; Zilles, Craig
2012-01-01
The ability to reason with formal logic is a foundational skill for computer scientists and computer engineers that scaffolds the abilities to design, debug, and optimize. By interviewing students about their understanding of propositional logic and their ability to translate from English specifications to Boolean expressions, we characterized…
NASA Astrophysics Data System (ADS)
Weltzin, J. F.; Rosemartin, A.; Crimmins, T. M.; Posthumus, E.
2015-12-01
The USA National Phenology Network (USA-NPN; www.usanpn.org) serves science and society by promoting a broad understanding of plant and animal phenology and the relationships among phenological patterns and all aspects of environmental change. Data maintained by USA-NPN is being used for applications related to science, conservation and resource management. The majority of the data have been provided by "citizen scientists" participating in a national-scale, multi-taxa phenology observation program, Nature's Notebook. Since 2008, more than 5,500 active participants registered with Nature's Notebook have contributed over 5.5 million observation records for plants and animals. This presentation will demonstrate several types of questions that can be addressed by engaging citizen scientists in a standardized national monitoring system focused on field observations of biodiversity. Because the proof is often in the pudding, we will feature a diversity of recently published studies, but will also highlight several new and ongoing local- to continental-scale projects. Projects include continental bioclimatic indices, regional assessments of historical and potential future trends in phenology, sub-regional assessments of temperate deciduous forest response to recent variability in spring-time heat accumulation, state- and management unit- level foci on spatio-temporal variation in organismal activity at both the population and community level, and local monitoring for invasive species detection across platforms from ground to satellite. Additional data-mining and exploration by interested researchers and/or resource managers will likely further demonstrate the value of these data. The bottom line is that "citizen science" represents a viable approach to collect data across spatiotemporal scales often unattainable to research scientists under typical resource constraints.
IEEE International Symposium on Biomedical Imaging.
2017-01-01
The IEEE International Symposium on Biomedical Imaging (ISBI) is a scientific conference dedicated to mathematical, algorithmic, and computational aspects of biological and biomedical imaging, across all scales of observation. It fosters knowledge transfer among different imaging communities and contributes to an integrative approach to biomedical imaging. ISBI is a joint initiative from the IEEE Signal Processing Society (SPS) and the IEEE Engineering in Medicine and Biology Society (EMBS). The 2018 meeting will include tutorials, and a scientific program composed of plenary talks, invited special sessions, challenges, as well as oral and poster presentations of peer-reviewed papers. High-quality papers are requested containing original contributions to the topics of interest including image formation and reconstruction, computational and statistical image processing and analysis, dynamic imaging, visualization, image quality assessment, and physical, biological, and statistical modeling. Accepted 4-page regular papers will be published in the symposium proceedings published by IEEE and included in IEEE Xplore. To encourage attendance by a broader audience of imaging scientists and offer additional presentation opportunities, ISBI 2018 will continue to have a second track featuring posters selected from 1-page abstract submissions without subsequent archival publication.
Science in the cloud (SIC): A use case in MRI connectomics
Gorgolewski, Krzysztof J.; Kleissas, Dean; Roncal, William Gray; Litt, Brian; Wandell, Brian; Poldrack, Russel A.; Wiener, Martin; Vogelstein, R. Jacob; Burns, Randal
2017-01-01
Abstract Modern technologies are enabling scientists to collect extraordinary amounts of complex and sophisticated data across a huge range of scales like never before. With this onslaught of data, we can allow the focal point to shift from data collection to data analysis. Unfortunately, lack of standardized sharing mechanisms and practices often make reproducing or extending scientific results very difficult. With the creation of data organization structures and tools that drastically improve code portability, we now have the opportunity to design such a framework for communicating extensible scientific discoveries. Our proposed solution leverages these existing technologies and standards, and provides an accessible and extensible model for reproducible research, called ‘science in the cloud’ (SIC). Exploiting scientific containers, cloud computing, and cloud data services, we show the capability to compute in the cloud and run a web service that enables intimate interaction with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results that will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended. PMID:28327935
Science in the cloud (SIC): A use case in MRI connectomics.
Kiar, Gregory; Gorgolewski, Krzysztof J; Kleissas, Dean; Roncal, William Gray; Litt, Brian; Wandell, Brian; Poldrack, Russel A; Wiener, Martin; Vogelstein, R Jacob; Burns, Randal; Vogelstein, Joshua T
2017-05-01
Modern technologies are enabling scientists to collect extraordinary amounts of complex and sophisticated data across a huge range of scales like never before. With this onslaught of data, we can allow the focal point to shift from data collection to data analysis. Unfortunately, lack of standardized sharing mechanisms and practices often make reproducing or extending scientific results very difficult. With the creation of data organization structures and tools that drastically improve code portability, we now have the opportunity to design such a framework for communicating extensible scientific discoveries. Our proposed solution leverages these existing technologies and standards, and provides an accessible and extensible model for reproducible research, called 'science in the cloud' (SIC). Exploiting scientific containers, cloud computing, and cloud data services, we show the capability to compute in the cloud and run a web service that enables intimate interaction with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results that will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended. © The Author 2017. Published by Oxford University Press.
A multiphysics and multiscale software environment for modeling astrophysical systems
NASA Astrophysics Data System (ADS)
Portegies Zwart, Simon; McMillan, Steve; Harfst, Stefan; Groen, Derek; Fujii, Michiko; Nualláin, Breanndán Ó.; Glebbeek, Evert; Heggie, Douglas; Lombardi, James; Hut, Piet; Angelou, Vangelis; Banerjee, Sambaran; Belkus, Houria; Fragos, Tassos; Fregeau, John; Gaburov, Evghenii; Izzard, Rob; Jurić, Mario; Justham, Stephen; Sottoriva, Andrea; Teuben, Peter; van Bever, Joris; Yaron, Ofer; Zemp, Marcel
2009-05-01
We present MUSE, a software framework for combining existing computational tools for different astrophysical domains into a single multiphysics, multiscale application. MUSE facilitates the coupling of existing codes written in different languages by providing inter-language tools and by specifying an interface between each module and the framework that represents a balance between generality and computational efficiency. This approach allows scientists to use combinations of codes to solve highly coupled problems without the need to write new codes for other domains or significantly alter their existing codes. MUSE currently incorporates the domains of stellar dynamics, stellar evolution and stellar hydrodynamics for studying generalized stellar systems. We have now reached a "Noah's Ark" milestone, with (at least) two available numerical solvers for each domain. MUSE can treat multiscale and multiphysics systems in which the time- and size-scales are well separated, like simulating the evolution of planetary systems, small stellar associations, dense stellar clusters, galaxies and galactic nuclei. In this paper we describe three examples calculated using MUSE: the merger of two galaxies, the merger of two evolving stars, and a hybrid N-body simulation. In addition, we demonstrate an implementation of MUSE on a distributed computer which may also include special-purpose hardware, such as GRAPEs or GPUs, to accelerate computations. The current MUSE code base is publicly available as open source at http://muse.li.
Development of the Tensoral Computer Language
NASA Technical Reports Server (NTRS)
Ferziger, Joel; Dresselhaus, Eliot
1996-01-01
The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.
Novel 3-D Computer Model Can Help Predict Pathogens’ Roles in Cancer | Poster
To understand how bacterial and viral infections contribute to human cancers, four NCI at Frederick scientists turned not to the lab bench, but to a computer. The team has created the world’s first—and currently, only—3-D computational approach for studying interactions between pathogen proteins and human proteins based on a molecular adaptation known as interface mimicry.
Social and Personal Factors in Semantic Infusion Projects
NASA Astrophysics Data System (ADS)
West, P.; Fox, P. A.; McGuinness, D. L.
2009-12-01
As part of our semantic data framework activities across multiple, diverse disciplines we required the involvement of domain scientists, computer scientists, software engineers, data managers, and often, social scientists. This involvement from a cross-section of disciplines turns out to be a social exercise as much as it is a technical and methodical activity. Each member of the team is used to different modes of working, expectations, vocabularies, levels of participation, and incentive and reward systems. We will examine how both roles and personal responsibilities play in the development of semantic infusion projects, and how an iterative development cycle can contribute to the successful completion of such a project.
NASA Astrophysics Data System (ADS)
Strayer, Michael
2009-07-01
Welcome to San Diego and the 2009 SciDAC conference. Over the next four days, I would like to present an assessment of the SciDAC program. We will look at where we've been, how we got to where we are and where we are going in the future. Our vision is to be first in computational science, to be best in class in modeling and simulation. When Ray Orbach asked me what I would do, in my job interview for the SciDAC Director position, I said we would achieve that vision. And with our collective dedicated efforts, we have managed to achieve this vision. In the last year, we have now the most powerful supercomputer for open science, Jaguar, the Cray XT system at the Oak Ridge Leadership Computing Facility (OLCF). We also have NERSC, probably the best-in-the-world program for productivity in science that the Office of Science so depends on. And the Argonne Leadership Computing Facility offers architectural diversity with its IBM Blue Gene/P system as a counterbalance to Oak Ridge. There is also ESnet, which is often understated—the 40 gigabit per second dual backbone ring that connects all the labs and many DOE sites. In the President's Recovery Act funding, there is exciting news that ESnet is going to build out to a 100 gigabit per second network using new optical technologies. This is very exciting news for simulations and large-scale scientific facilities. But as one noted SciDAC luminary said, it's not all about the computers—it's also about the science—and we are also achieving our vision in this area. Together with having the fastest supercomputer for science, at the SC08 conference, SciDAC researchers won two ACM Gordon Bell Prizes for the outstanding performance of their applications. The DCA++ code, which solves some very interesting problems in materials, achieved a sustained performance of 1.3 petaflops, an astounding result and a mark I suspect will last for some time. The LS3DF application for studying nanomaterials also required the development of a new and novel algorithm to produce results up to 400 times faster than a similar application, and was recognized with a prize for algorithm innovation—a remarkable achievement. Day one of our conference will include examples of petascale science enabled at the OLCF. Although Jaguar has not been officially commissioned, it has gone through its acceptance tests, and during its shakedown phase there have been pioneer applications used for the acceptance tests, and they are running at scale. These include applications in the areas of astrophysics, biology, chemistry, combustion, fusion, geosciences, materials science, nuclear energy and nuclear physics. We also have a whole compendium of science we do at our facilities; these have been documented and reviewed at our last SciDAC conference. Many of these were highlighted in our Breakthroughs Report. One session at this week's conference will feature a cross-section of these breakthroughs. In the area of scalable electromagnetic simulations, the Auxiliary-space Maxwell Solver (AMS) uses specialized finite element discretizations and multigrid-based techniques, which decompose the original problem into easier-to-solve subproblems. Congratulations to the mathematicians on this. Another application on the list of breakthroughs was the authentication of PETSc, which provides scalable solvers used in many DOE applications and has solved problems with over 3 billion unknowns and scaled to over 16,000 processors on DOE leadership-class computers. This is becoming a very versatile and useful toolkit to achieve performance at scale. With the announcement of SIAM's first class of Fellows, we are remarkably well represented. Of the group of 191, more than 40 of these Fellows are in the 'DOE space.' We are so delighted that SIAM has recognized them for their many achievements. In the coming months, we will illustrate our leadership in applied math and computer science by looking at our contributions in the areas of programming models, development and performance tools, math libraries, system software, collaboration, and visualization and data analytics. This is a large and diverse list of libraries. We have asked for two panels, one chaired by David Keyes and composed of many of the nation's leading mathematicians, to produce a report on the most significant accomplishments in applied mathematics over the last eight years, taking us back to the start of the SciDAC program. In addition, we have a similar panel in computer science to be chaired by Kathy Yelick. They are going to identify the computer science accomplishments of the past eight years. These accomplishments are difficult to get a handle on, and I'm looking forward to this report. We will also have a follow-on to our report on breakthroughs in computational science and this will also go back eight years, looking at the many accomplishments under the SciDAC and INCITE programs. This will be chaired by Tony Mezzacappa. So, where are we going in the SciDAC program? It might help to take a look at computational science and how it got started. I go back to Ken Wilson, who made the model and has written on computational science and computational science education. His model was thus: The computational scientist plays the role of the experimentalist, and the math and CS researchers play the role of theorists, and the computers themselves are the experimental apparatus. And that in simulation science, we are carrying out numerical experiments as to the nature of physical and biological sciences. Peter Lax, in the same time frame, developed a report on large-scale computing in science and engineering. Peter remarked, 'Perhaps the most important applications of scientific computing come not in the solution of old problems, but in the discovery of new phenomena through numerical experimentation.' And in the early years, I think the person who provided the most guidance, the most innovation and the most vision for where the future might lie was Ed Oliver. Ed Oliver died last year. Ed did a number of things in science. He had this personality where he knew exactly what to do, but he preferred to stay out of the limelight so that others could enjoy the fruits of his vision. We in the SciDAC program and ASCR Facilities are still enjoying the benefits of his vision. We will miss him. Twenty years after Ken Wilson, Ray Orbach laid out the fundamental premise for SciDAC in an interview that appeared in SciDAC Review: 'SciDAC is unique in the world. There isn't any other program like it anywhere else, and it has the remarkable ability to do science by bringing together physical scientists, mathematicians, applied mathematicians, and computer scientists who recognize that computation is not something you do at the end, but rather it needs to be built into the solution of the very problem that one is addressing. ' As you look at the Lax report from 1982, it talks about how 'Future significant improvements may have to come from architectures embodying parallel processing elements—perhaps several thousands of processors.' And it continues, 'esearch in languages, algorithms and numerical analysis will be crucial in learning to exploit these new architectures fully.' In the early '90s, Sterling, Messina and Smith developed a workshop report on petascale computing and concluded, 'A petaflops computer system will be feasible in two decades, or less, and rely in part on the continual advancement of the semiconductor industry both in speed enhancement and cost reduction through improved fabrication processes.' So they were not wrong, and today we are embarking on a forward look that is at a different scale, the exascale, going to 1018 flops. In 2007, Stevens, Simon and Zacharia chaired a series of town hall meetings looking at exascale computing, and in their report wrote, 'Exascale computer systems are expected to be technologically feasible within the next 15 years, or perhaps sooner. These systems will push the envelope in a number of important technologies: processor architecture, scale of multicore integration, power management and packaging.' The concept of computing on the Jaguar computer involves hundreds of thousands of cores, as do the IBM systems that are currently out there. So the scale of computing with systems with billions of processors is staggering to me, and I don't know how the software and math folks feel about it. We have now embarked on a road toward extreme scale computing. We have created a series of town hall meetings and we are now in the process of holding workshops that address what I call within the DOE speak 'the mission need,' or what is the scientific justification for computing at that scale. We are going to have a total of 13 workshops. The workshops on climate, high energy physics, nuclear physics, fusion, and nuclear energy have been held. The report from the workshop on climate is actually out and available, and the other reports are being completed. The upcoming workshops are on biology, materials, and chemistry; and workshops that engage science for nuclear security are a partnership between NNSA and ASCR. There are additional workshops on applied math, computer science, and architecture that are needed for computing at the exascale. These extreme scale workshops will provide the foundation in our office, the Office of Science, the NNSA and DOE, and we will engage the National Science Foundation and the Department of Defense as partners. We envision a 10-year program for an exascale initiative. It will be an integrated R&D program initially—you can think about five years for research and development—that would be in hardware, operating systems, file systems, networking and so on, as well as software for applications. Application software and the operating system and the hardware all need to be bundled in this period so that at the end the system will execute the science applications at scale. We also believe that this process will have to have considerable investment from the manufacturers and vendors to be successful. We have formed laboratory, university and industry working groups to start this process and formed a panel to look at where SciDAC needs to go to compute at the extreme scale, and we have formed an executive committee within the Office of Science and the NNSA to focus on these activities. We will have outreach to DoD in the next few months. We are anticipating a solicitation within the next two years in which we will compete this bundled R&D process. We don't know how we will incorporate SciDAC into extreme scale computing, but we do know there will be many challenges. And as we have shown over the years, we have the expertise and determination to surmount these challenges.
Collaborative Science Using Web Services and the SciFlo Grid Dataflow Engine
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Yunck, T.
2006-12-01
The General Earth Science Investigation Suite (GENESIS) project is a NASA-sponsored partnership between the Jet Propulsion Laboratory, academia, and NASA data centers to develop a new suite of Web Services tools to facilitate multi-sensor investigations in Earth System Science. The goal of GENESIS is to enable large-scale, multi-instrument atmospheric science using combined datasets from the AIRS, MODIS, MISR, and GPS sensors. Investigations include cross-comparison of spaceborne climate sensors, cloud spectral analysis, study of upper troposphere-stratosphere water transport, study of the aerosol indirect cloud effect, and global climate model validation. The challenges are to bring together very large datasets, reformat and understand the individual instrument retrievals, co-register or re-grid the retrieved physical parameters, perform computationally-intensive data fusion and data mining operations, and accumulate complex statistics over months to years of data. To meet these challenges, we have developed a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data access, subsetting, registration, mining, fusion, compression, and advanced statistical analysis. SciFlo leverages remote Web Services, called via Simple Object Access Protocol (SOAP) or REST (one-line) URLs, and the Grid Computing standards (WS-* &Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable Web Services and native executables into a distributed computing flow (tree of operators). The SciFlo client &server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. In particular, SciFlo exploits the wealth of datasets accessible by OpenGIS Consortium (OGC) Web Mapping Servers & Web Coverage Servers (WMS/WCS), and by Open Data Access Protocol (OpenDAP) servers. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. Once an analysis has been specified for a chunk or day of data, it can be easily repeated with different control parameters or over months of data. Recently, the Earth Science Information Partners (ESIP) Federation sponsored a collaborative activity in which several ESIP members advertised their respective WMS/WCS and SOAP services, developed some collaborative science scenarios for atmospheric and aerosol science, and then choreographed services from multiple groups into demonstration workflows using the SciFlo engine and a Business Process Execution Language (BPEL) workflow engine. For several scenarios, the same collaborative workflow was executed in three ways: using hand-coded scripts, by executing a SciFlo document, and by executing a BPEL workflow document. We will discuss the lessons learned from this activity, the need for standardized interfaces (like WMS/WCS), the difficulty in agreeing on even simple XML formats and interfaces, and further collaborations that are being pursued.
Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.
2011-01-01
Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772
A Scientist's Guide to Achieving Broader Impacts through K–12 STEM Collaboration
Komoroske, Lisa M.; Hameed, Sarah O.; Szoboszlai, Amber I.; Newsom, Amanda J.; Williams, Susan L.
2015-01-01
The National Science Foundation and other funding agencies are increasingly requiring broader impacts in grant applications to encourage US scientists to contribute to science education and society. Concurrently, national science education standards are using more inquiry-based learning (IBL) to increase students’ capacity for abstract, conceptual thinking applicable to real-world problems. Scientists are particularly well suited to engage in broader impacts via science inquiry outreach, because scientific research is inherently an inquiry-based process. We provide a practical guide to help scientists overcome obstacles that inhibit their engagement in K–12 IBL outreach and to attain the accrued benefits. Strategies to overcome these challenges include scaling outreach projects to the time available, building collaborations in which scientists’ research overlaps with curriculum, employing backward planning to target specific learning objectives, encouraging scientists to share their passion, as well as their expertise with students, and transforming institutional incentives to support scientists engaging in educational outreach. PMID:26955078
Assembling Large, Multi-Sensor Climate Datasets Using the SciFlo Grid Workflow System
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Xing, Z.; Fetzer, E.
2008-12-01
NASA's Earth Observing System (EOS) is the world's most ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the A-Train platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the cloud scenes from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time matchups between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, and assemble merged datasets for further scientific and statistical analysis. To meet these large-scale challenges, we are utilizing a Grid computing and dataflow framework, named SciFlo, in which we are deploying a set of versatile and reusable operators for data query, access, subsetting, co-registration, mining, fusion, and advanced statistical analysis. SciFlo is a semantically-enabled ("smart") Grid Workflow system that ties together a peer-to-peer network of computers into an efficient engine for distributed computation. The SciFlo workflow engine enables scientists to do multi-instrument Earth Science by assembling remotely-invokable Web Services (SOAP or http GET URLs), native executables, command-line scripts, and Python codes into a distributed computing flow. A scientist visually authors the graph of operation in the VizFlow GUI, or uses a text editor to modify the simple XML workflow documents. The SciFlo client & server engines optimize the execution of such distributed workflows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The engine transparently moves data to the operators, and moves operators to the data (on the dozen trusted SciFlo nodes). SciFlo also deploys a variety of Data Grid services to: query datasets in space and time, locate & retrieve on-line data granules, provide on-the-fly variable and spatial subsetting, and perform pairwise instrument matchups for A-Train datasets. These services are combined into efficient workflows to assemble the desired large-scale, merged climate datasets. SciFlo is currently being applied in several large climate studies: comparisons of aerosol optical depth between MODIS, MISR, AERONET ground network, and U. Michigan's IMPACT aerosol transport model; characterization of long-term biases in microwave and infrared instruments (AIRS, MLS) by comparisons to GPS temperature retrievals accurate to 0.1 degrees Kelvin; and construction of a decade-long, multi-sensor water vapor climatology stratified by classified cloud scene by bringing together datasets from AIRS/AMSU, AMSR-E, MLS, MODIS, and CloudSat (NASA MEASUREs grant, Fetzer PI). The presentation will discuss the SciFlo technologies, their application in these distributed workflows, and the many challenges encountered in assembling and analyzing these massive datasets.
[Activities of Research Institute for Advanced Computer Science
NASA Technical Reports Server (NTRS)
Gross, Anthony R. (Technical Monitor); Leiner, Barry M.
2001-01-01
The Research Institute for Advanced Computer Science (RIACS) carries out basic research and technology development in computer science, in support of the National Aeronautics and Space Administrations missions. RIACS is located at the NASA Ames Research Center, Moffett Field, California. RIACS research focuses on the three cornerstones of IT research necessary to meet the future challenges of NASA missions: 1. Automated Reasoning for Autonomous Systems Techniques are being developed enabling spacecraft that will be self-guiding and self-correcting to the extent that they will require little or no human intervention. Such craft will be equipped to independently solve problems as they arise, and fulfill their missions with minimum direction from Earth. 2. Human-Centered Computing Many NASA missions require synergy between humans and computers, with sophisticated computational aids amplifying human cognitive and perceptual abilities. 3. High Performance Computing and Networking Advances in the performance of computing and networking continue to have major impact on a variety of NASA endeavors, ranging from modeling and simulation to analysis of large scientific datasets to collaborative engineering, planning and execution. In addition, RIACS collaborates with NASA scientists to apply IT research to a variety of NASA application domains. RIACS also engages in other activities, such as workshops, seminars, visiting scientist programs and student summer programs, designed to encourage and facilitate collaboration between the university and NASA IT research communities.
NASA Astrophysics Data System (ADS)
Uher, Jana
2014-12-01
The growing interest in "personality" from scientists of ever more diverse fields demands conceptual integrations-and reveals fundamental challenges. For what is "personality" given that "it" is explored in humans and nonhuman species, that people encode "it" in their everyday language, scientists seek "it" in the brain and study "it" primarily with rating scales?
Lobo, Daniel; Levin, Michael
2015-01-01
Transformative applications in biomedicine require the discovery of complex regulatory networks that explain the development and regeneration of anatomical structures, and reveal what external signals will trigger desired changes of large-scale pattern. Despite recent advances in bioinformatics, extracting mechanistic pathway models from experimental morphological data is a key open challenge that has resisted automation. The fundamental difficulty of manually predicting emergent behavior of even simple networks has limited the models invented by human scientists to pathway diagrams that show necessary subunit interactions but do not reveal the dynamics that are sufficient for complex, self-regulating pattern to emerge. To finally bridge the gap between high-resolution genetic data and the ability to understand and control patterning, it is critical to develop computational tools to efficiently extract regulatory pathways from the resultant experimental shape phenotypes. For example, planarian regeneration has been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model has yet been found by human scientists that explains more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. We present a method to infer the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. We demonstrated our approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature; By analyzing all the datasets together, our system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provides an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form. PMID:26042810
Enabling Extreme Scale Earth Science Applications at the Oak Ridge Leadership Computing Facility
NASA Astrophysics Data System (ADS)
Anantharaj, V. G.; Mozdzynski, G.; Hamrud, M.; Deconinck, W.; Smith, L.; Hack, J.
2014-12-01
The Oak Ridge Leadership Facility (OLCF), established at the Oak Ridge National Laboratory (ORNL) under the auspices of the U.S. Department of Energy (DOE), welcomes investigators from universities, government agencies, national laboratories and industry who are prepared to perform breakthrough research across a broad domain of scientific disciplines, including earth and space sciences. Titan, the OLCF flagship system, is currently listed as #2 in the Top500 list of supercomputers in the world, and the largest available for open science. The computational resources are allocated primarily via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, sponsored by the U.S. DOE Office of Science. In 2014, over 2.25 billion core hours on Titan were awarded via INCITE projects., including 14% of the allocation toward earth sciences. The INCITE competition is also open to research scientists based outside the USA. In fact, international research projects account for 12% of the INCITE awards in 2014. The INCITE scientific review panel also includes 20% participation from international experts. Recent accomplishments in earth sciences at OLCF include the world's first continuous simulation of 21,000 years of earth's climate history (2009); and an unprecedented simulation of a magnitude 8 earthquake over 125 sq. miles. One of the ongoing international projects involves scaling the ECMWF Integrated Forecasting System (IFS) model to over 200K cores of Titan. ECMWF is a partner in the EU funded Collaborative Research into Exascale Systemware, Tools and Applications (CRESTA) project. The significance of the research carried out within this project is the demonstration of techniques required to scale current generation Petascale capable simulation codes towards the performance levels required for running on future Exascale systems. One of the techniques pursued by ECMWF is to use Fortran2008 coarrays to overlap computations and communications and to reduce the total volume of data communicated. Use of Titan has enabled ECMWF to plan future scalability developments and resource requirements. We will also discuss the best practices developed over the years in navigating logistical, legal and regulatory hurdles involved in supporting the facility's diverse user community.
Neo-deterministic definition of earthquake hazard scenarios: a multiscale application to India
NASA Astrophysics Data System (ADS)
Peresan, Antonella; Magrin, Andrea; Parvez, Imtiyaz A.; Rastogi, Bal K.; Vaccari, Franco; Cozzini, Stefano; Bisignano, Davide; Romanelli, Fabio; Panza, Giuliano F.; Ashish, Mr; Mir, Ramees R.
2014-05-01
The development of effective mitigation strategies requires scientifically consistent estimates of seismic ground motion; recent analysis, however, showed that the performances of the classical probabilistic approach to seismic hazard assessment (PSHA) are very unsatisfactory in anticipating ground shaking from future large earthquakes. Moreover, due to their basic heuristic limitations, the standard PSHA estimates are by far unsuitable when dealing with the protection of critical structures (e.g. nuclear power plants) and cultural heritage, where it is necessary to consider extremely long time intervals. Nonetheless, the persistence in resorting to PSHA is often explained by the need to deal with uncertainties related with ground shaking and earthquakes recurrence. We show that current computational resources and physical knowledge of the seismic waves generation and propagation processes, along with the improving quantity and quality of geophysical data, allow nowadays for viable numerical and analytical alternatives to the use of PSHA. The advanced approach considered in this study, namely the NDSHA (neo-deterministic seismic hazard assessment), is based on the physically sound definition of a wide set of credible scenario events and accounts for uncertainties and earthquakes recurrence in a substantially different way. The expected ground shaking due to a wide set of potential earthquakes is defined by means of full waveforms modelling, based on the possibility to efficiently compute synthetic seismograms in complex laterally heterogeneous anelastic media. In this way a set of scenarios of ground motion can be defined, either at national and local scale, the latter considering the 2D and 3D heterogeneities of the medium travelled by the seismic waves. The efficiency of the NDSHA computational codes allows for the fast generation of hazard maps at the regional scale even on a modern laptop computer. At the scenario scale, quick parametric studies can be easily performed to understand the influence of the model characteristics on the computed ground shaking scenarios. For massive parametric tests, or for the repeated generation of large scale hazard maps, the methodology can take advantage of more advanced computational platforms, ranging from GRID computing infrastructures to HPC dedicated clusters up to Cloud computing. In such a way, scientists can deal efficiently with the variety and complexity of the potential earthquake sources, and perform parametric studies to characterize the related uncertainties. NDSHA provides realistic time series of expected ground motion readily applicable for seismic engineering analysis and other mitigation actions. The methodology has been successfully applied to strategic buildings, lifelines and cultural heritage sites, and for the purpose of seismic microzoning in several urban areas worldwide. A web application is currently being developed that facilitates the access to the NDSHA methodology and the related outputs by end-users, who are interested in reliable territorial planning and in the design and construction of buildings and infrastructures in seismic areas. At the same, the web application is also shaping up as an advanced educational tool to explore interactively how seismic waves are generated at the source, propagate inside structural models, and build up ground shaking scenarios. We illustrate the preliminary results obtained from a multiscale application of NDSHA approach to the territory of India, zooming from large scale hazard maps of ground shaking at bedrock, to the definition of local scale earthquake scenarios for selected sites in the Gujarat state (NW India). The study aims to provide the community (e.g. authorities and engineers) with advanced information for earthquake risk mitigation, which is particularly relevant to Gujarat in view of the rapid development and urbanization of the region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hollingsworth, Jeff
2014-07-31
The purpose of this project was to develop tools and techniques to improve the ability of computational scientists to investigate and correct problems (bugs) in their programs. Specifically, the University of Maryland component of this project focused on the problems associated with the finite number of bits available in a computer to represent numeric values. In large scale scientific computation, numbers are frequently added to and multiplied with each other billions of times. Thus even small errors due to the representation of numbers can accumulate into big errors. However, using too many bits to represent a number results in additionalmore » computation, memory, and energy costs. Thus it is critical to find the right size for numbers. This project focused on several aspects of this general problem. First, we developed a tool to look for cancelations, the catastrophic loss of precision in numbers due to the addition of two numbers whose actual values are close to each other, but whose representation in a computer is identical or nearly so. Second, we developed a suite of tools to allow programmers to identify exactly how much precision is required for each operation in their program. This tool allows programmers to both verify that enough precision is available, but more importantly find cases where extra precision could be eliminated to allow the program to use less memory, computer time, or energy. These tools use advanced binary modification techniques to allow the analysis of actual optimized code. The system, called Craft, has been applied to a number of benchmarks and real applications.« less
Decision tree and ensemble learning algorithms with their applications in bioinformatics.
Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping
2011-01-01
Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
Supercomputing Sheds Light on the Dark Universe
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Heitmann, Katrin
2012-11-15
At Argonne National Laboratory, scientists are using supercomputers to shed light on one of the great mysteries in science today, the Dark Universe. With Mira, a petascale supercomputer at the Argonne Leadership Computing Facility, a team led by physicists Salman Habib and Katrin Heitmann will run the largest, most complex simulation of the universe ever attempted. By contrasting the results from Mira with state-of-the-art telescope surveys, the scientists hope to gain new insights into the distribution of matter in the universe, advancing future investigations of dark energy and dark matter into a new realm. The team's research was named amore » finalist for the 2012 Gordon Bell Prize, an award recognizing outstanding achievement in high-performance computing.« less
NASA Astrophysics Data System (ADS)
Gordov, E. P.; Lykosov, V. N.; Genina, E. Yu; Gordova, Yu E.
2017-11-01
The paper describes a regular events CITES consisting of young scientists school and international conference as a tool for training and professional growth. The events address the most pressing issues of application of information-computational technologies in environmental sciences and young scientists’ training, diminishing a gap between university graduates’ skill and concurrent challenges. The viability of the approach to the CITES organization is proved by the fact that single event organized in 2001 turned into a series, quite a few young participants successfully defended their PhD thesis and a number of researchers became Doctors of Science during these years. Young researchers from Russia and foreign countries show undiminishing interest to these events.
Incentives to Encourage Scientific Web Contribution (Invited)
NASA Astrophysics Data System (ADS)
Antunes, A. K.
2010-12-01
We suggest improvements to citation standards and creation of remuneration opportunities to encourage career scientist contributions to Web2.0 and social media science channels. At present, agencies want to accomplish better outreach and engagement with no funding, while scientists sacrifice their personal time to contribute to web and social media sites. Securing active participation by scientists requires career recognition of the value scientists provide to web knowledge bases and to the general public. One primary mechanism to encourage participation is citation standards, which let a contributor improve their reputation in a quantifiable way. But such standards must be recognized by their scientific and workplace communities. Using case studies such as the acceptance of web in the workplace and the growth of open access journals, we examine what agencies and individual can do as well as the time scales needed to secure increased active contribution by scientists. We also discuss ways to jumpstart this process.
The New Ecological Paradigm Revisited: Anchoring the NEP Scale in Environmental Ethics
ERIC Educational Resources Information Center
Lundmark, Carina
2007-01-01
The New Environmental or Ecological Paradigm (NEP) is widely acknowledged as a reliable multiple-item scale to capture environmental attitudes or beliefs. It has been used in statistical analyses for almost 30 years, primarily by psychologists, but also by political scientists, sociologists and geographers. The scale's theoretical foundation is,…
Multiuser Collaboration with Networked Mobile Devices
NASA Technical Reports Server (NTRS)
Tso, Kam S.; Tai, Ann T.; Deng, Yong M.; Becks, Paul G.
2006-01-01
In this paper we describe a multiuser collaboration infrastructure that enables multiple mission scientists to remotely and collaboratively interact with visualization and planning software, using wireless networked personal digital assistants(PDAs) and other mobile devices. During ground operations of planetary rover and lander missions, scientists need to meet daily to review downlinked data and plan science activities. For example, scientists use the Science Activity Planner (SAP) in the Mars Exploration Rover (MER) mission to visualize downlinked data and plan rover activities during the science meetings [1]. Computer displays are projected onto large screens in the meeting room to enable the scientists to view and discuss downlinked images and data displayed by SAP and other software applications. However, only one person can interact with the software applications because input to the computer is limited to a single mouse and keyboard. As a result, the scientists have to verbally express their intentions, such as selecting a target at a particular location on the Mars terrain image, to that person in order to interact with the applications. This constrains communication and limits the returns of science planning. Furthermore, ground operations for Mars missions are fundamentally constrained by the short turnaround time for science and engineering teams to process and analyze data, plan the next uplink, generate command sequences, and transmit the uplink to the vehicle [2]. Therefore, improving ground operations is crucial to the success of Mars missions. The multiuser collaboration infrastructure enables users to control software applications remotely and collaboratively using mobile devices. The infrastructure includes (1) human-computer interaction techniques to provide natural, fast, and accurate inputs, (2) a communications protocol to ensure reliable and efficient coordination of the input devices and host computers, (3) an application-independent middleware that maintains the states, sessions, and interactions of individual users of the software applications, (4) an application programming interface to enable tight integration of applications and the middleware. The infrastructure is able to support any software applications running under the Windows or Unix platforms. The resulting technologies not only are applicable to NASA mission operations, but also useful in other situations such as design reviews, brainstorming sessions, and business meetings, as they can benefit from having the participants concurrently interact with the software applications (e.g., presentation applications and CAD design tools) to illustrate their ideas and provide inputs.
Realism and Perspectivism: a Reevaluation of Rival Theories of Spatial Vision.
NASA Astrophysics Data System (ADS)
Thro, E. Broydrick
1990-01-01
My study reevaluates two theories of human space perception, a trigonometric surveying theory I call perspectivism and a "scene recognition" theory I call realism. Realists believe that retinal image geometry can supply no unambiguous information about an object's size and distance--and that, as a result, viewers can locate objects in space only by making discretionary interpretations based on familiar experience of object types. Perspectivists, in contrast, think viewers can disambiguate object sizes/distances on the basis of retinal image information alone. More specifically, they believe the eye responds to perspective image geometry with an automatic trigonometric calculation that not only fixes the directions and shapes, but also roughly fixes the sizes and distances of scene elements in space. Today this surveyor theory has been largely superceded by the realist approach, because most vision scientists believe retinal image geometry is ambiguous about the scale of space. However, I show that there is a considerable body of neglected evidence, both past and present, tending to call this scale ambiguity claim into question. I maintain that this evidence against scale ambiguity could hardly be more important, if one considers its subversive implications for the scene recognition theory that is not only today's reigning approach to spatial vision, but also the foundation for computer scientists' efforts to create space-perceiving robots. If viewers were deemed to be capable of automatic surveying calculations, the discretionary scene recognition theory would lose its main justification. Clearly, it would be difficult for realists to maintain that we viewers rely on scene recognition for space perception in spite of our ability to survey. And in reality, as I show, the surveyor theory does a much better job of describing the everyday space we viewers actually see--a space featuring stable, unambiguous relationships among scene elements, and a single horizon and vanishing point for (meter-scale) receding objects. In addition, I argue, the surveyor theory raises fewer philosophical difficulties, because it is more in harmony with our everyday concepts of material objects, human agency and the self.
Bañares, Miguel A; Haase, Andrea; Tran, Lang; Lobaskin, Vladimir; Oberdörster, Günter; Rallo, Robert; Leszczynski, Jerzy; Hoet, Peter; Korenstein, Rafi; Hardy, Barry; Puzyn, Tomasz
2017-09-01
A first European Conference on Computational Nanotoxicology, CompNanoTox, was held in November 2015 in Benahavís, Spain with the objectives to disseminate and integrate results from the European modeling and database projects (NanoPUZZLES, ModENPTox, PreNanoTox, MembraneNanoPart, MODERN, eNanoMapper and EU COST TD1204 MODENA) as well as to create synergies within the European NanoSafety Cluster. This conference was supported by the COST Action TD1204 MODENA on developing computational methods for toxicological risk assessment of engineered nanoparticles and provided a unique opportunity for cross fertilization among complementary disciplines. The efforts to develop and validate computational models crucially depend on high quality experimental data and relevant assays which will be the basis to identify relevant descriptors. The ambitious overarching goal of this conference was to promote predictive nanotoxicology, which can only be achieved by a close collaboration between the computational scientists (e.g. database experts, modeling experts for structure, (eco) toxicological effects, performance and interaction of nanomaterials) and experimentalists from different areas (in particular toxicologists, biologists, chemists and material scientists, among others). The main outcome and new perspectives of this conference are summarized here.
Tools and techniques for computational reproducibility.
Piccolo, Stephen R; Frampton, Michael B
2016-07-11
When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. The deterministic nature of most computer programs means that the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced because of complexities in how software is packaged, installed, and executed-and because of limitations associated with how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges; here we describe seven such strategies. With a broad scientific audience in mind, we describe the strengths and limitations of each approach, as well as the circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bañares, Miguel A.; Haase, Andrea; Tran, Lang
A first European Conference on Computational Nanotoxicology, CompNanoTox, was held in November 2015 in Benahavís, Spain with the objectives to disseminate and integrate results from the European modeling and database projects (NanoPUZZLES, ModENPTox, PreNanoTox, MembraneNanoPart, MODERN, eNanoMapper and EU COST TD1204 MODENA) as well as to create synergies within the European NanoSafety Cluster. This conference was supported by the COST Action TD1204 MODENA on developing computational methods for toxicological risk assessment of engineered nanoparticles and provided a unique opportunity for crossfertilization among complementary disciplines. The efforts to develop and validate computational models crucially depend on high quality experimental data andmore » relevant assays which will be the basis to identify relevant descriptors. The ambitious overarching goal of this conference was to promote predictive nanotoxicology, which can only be achieved by a close collaboration between the computational scientists (e.g. database experts, modeling experts for structure, (eco) toxicological effects, performance and interaction of nanomaterials) and experimentalists from different areas (in particular toxicologists, biologists, chemists and material scientists, among others). The main outcome and new perspectives of this conference are summarized here.« less
Parallel-In-Time For Moving Meshes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falgout, R. D.; Manteuffel, T. A.; Southworth, B.
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less
Dodge, Somayeh; Bohrer, Gil; Weinzierl, Rolf P.; Davidson, Sarah C.; Kays, Roland; Douglas, David C.; Cruz, Sebastian; Han, J.; Brandes, David; Wikelski, Martin
2013-01-01
The movement of animals is strongly influenced by external factors in their surrounding environment such as weather, habitat types, and human land use. With advances in positioning and sensor technologies, it is now possible to capture animal locations at high spatial and temporal granularities. Likewise, scientists have an increasing access to large volumes of environmental data. Environmental data are heterogeneous in source and format, and are usually obtained at different spatiotemporal scales than movement data. Indeed, there remain scientific and technical challenges in developing linkages between the growing collections of animal movement data and the large repositories of heterogeneous remote sensing observations, as well as in the developments of new statistical and computational methods for the analysis of movement in its environmental context. These challenges include retrieval, indexing, efficient storage, data integration, and analytical techniques.
Gro2mat: a package to efficiently read gromacs output in MATLAB.
Dien, Hung; Deane, Charlotte M; Knapp, Bernhard
2014-07-30
Molecular dynamics (MD) simulations are a state-of-the-art computational method used to investigate molecular interactions at atomic scale. Interaction processes out of experimental reach can be monitored using MD software, such as Gromacs. Here, we present the gro2mat package that allows fast and easy access to Gromacs output files from Matlab. Gro2mat enables direct parsing of the most common Gromacs output formats including the binary xtc-format. No openly available Matlab parser currently exists for this format. The xtc reader is orders of magnitudes faster than other available pdb/ascii workarounds. Gro2mat is especially useful for scientists with an interest in quick prototyping of new mathematical and statistical approaches for Gromacs trajectory analyses. © 2014 Wiley Periodicals, Inc. Copyright © 2014 Wiley Periodicals, Inc.
Vistas in applied mathematics: Numerical analysis, atmospheric sciences, immunology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balakrishnan, A.V.; Dorodnitsyn, A.A.; Lions, J.L.
1986-01-01
Advances in the theory and application of numerical modeling techniques are discussed in papers contributed, primarily by Soviet scientists, on the occasion of the 60th birthday of Gurii I. Marchuk. Topics examined include splitting techniques for computations of industrial flows, the mathematical foundations of the k-epsilon turbulence model, splitting methods for the solution of the incompressible Navier-Stokes equations, the approximation of inhomogeneous hyperbolic boundary-value problems, multigrid methods, and the finite-element approximation of minimal surfaces. Consideration is given to dynamic modeling of moist atmospheres, satellite observations of the earth radiation budget and the problem of energy-active ocean regions, a numerical modelmore » of the biosphere for use with GCMs, and large-scale modeling of ocean circulation. Also included are several papers on modeling problems in immunology.« less
Stereotyping in Relation to the Gender Gap in Participation in Computing.
ERIC Educational Resources Information Center
Siann, Gerda; And Others
1988-01-01
A questionnaire completed by 928 postsecondary students asked subjects to rate one of two computer scientists on 16 personal attributes. Aside from gender of the ratee, questionnaires were identical. Results indicate that on eight attributes the female was rated significantly more positively than the male. Implications are discussed. (Author/CH)
Constructing Contracts: Making Discrete Mathematics Relevant to Beginning Programmers
ERIC Educational Resources Information Center
Gegg-Harrison, Timothy S.
2005-01-01
Although computer scientists understand the importance of discrete mathematics to the foundations of their field, computer science (CS) students do not always see the relevance. Thus, it is important to find a way to show students its relevance. The concept of program correctness is generally taught as an activity independent of the programming…
Communication for Scientists and Engineers: A "Computer Model" in the Basic Course.
ERIC Educational Resources Information Center
Haynes, W. Lance
Successful speech should rest not on prepared notes and outlines but on genuine oral discourse based on "data" fed into the "software" in the computer which already exists within each person. Writing cannot speak for itself, nor can it continually adjust itself to accommodate diverse response. Moreover, no matter how skillfully…
Identification of Factors That Affect Software Complexity.
ERIC Educational Resources Information Center
Kaiser, Javaid
A survey of computer scientists was conducted to identify factors that affect software complexity. A total of 160 items were selected from the literature to include in a questionnaire sent to 425 individuals who were employees of computer-related businesses in Lawrence and Kansas City. The items were grouped into nine categories called system…
Synthetic Biology: Knowledge Accessed by Everyone (Open Sources)
ERIC Educational Resources Information Center
Sánchez Reyes, Patricia Margarita
2016-01-01
Using the principles of biology, along with engineering and with the help of computer, scientists manage to copy. DNA sequences from nature and use them to create new organisms. DNA is created through engineering and computer science managing to create life inside a laboratory. We cannot dismiss the role that synthetic biology could lead in…
The Multiple Pendulum Problem via Maple[R
ERIC Educational Resources Information Center
Salisbury, K. L.; Knight, D. G.
2002-01-01
The way in which computer algebra systems, such as Maple, have made the study of physical problems of some considerable complexity accessible to mathematicians and scientists with modest computational skills is illustrated by solving the multiple pendulum problem. A solution is obtained for four pendulums with no restriction on the size of the…
Computers and the Future of Skill Demand. Educational Research and Innovation Series
ERIC Educational Resources Information Center
Elliott, Stuart W.
2017-01-01
Computer scientists are working on reproducing all human skills using artificial intelligence, machine learning and robotics. Unsurprisingly then, many people worry that these advances will dramatically change work skills in the years ahead and perhaps leave many workers unemployable. This report develops a new approach to understanding these…
Alford, Rebecca F.; Dolan, Erin L.
2017-01-01
Computational biology is an interdisciplinary field, and many computational biology research projects involve distributed teams of scientists. To accomplish their work, these teams must overcome both disciplinary and geographic barriers. Introducing new training paradigms is one way to facilitate research progress in computational biology. Here, we describe a new undergraduate program in biomolecular structure prediction and design in which students conduct research at labs located at geographically-distributed institutions while remaining connected through an online community. This 10-week summer program begins with one week of training on computational biology methods development, transitions to eight weeks of research, and culminates in one week at the Rosetta annual conference. To date, two cohorts of students have participated, tackling research topics including vaccine design, enzyme design, protein-based materials, glycoprotein modeling, crowd-sourced science, RNA processing, hydrogen bond networks, and amyloid formation. Students in the program report outcomes comparable to students who participate in similar in-person programs. These outcomes include the development of a sense of community and increases in their scientific self-efficacy, scientific identity, and science values, all predictors of continuing in a science research career. Furthermore, the program attracted students from diverse backgrounds, which demonstrates the potential of this approach to broaden the participation of young scientists from backgrounds traditionally underrepresented in computational biology. PMID:29216185
Alford, Rebecca F; Leaver-Fay, Andrew; Gonzales, Lynda; Dolan, Erin L; Gray, Jeffrey J
2017-12-01
Computational biology is an interdisciplinary field, and many computational biology research projects involve distributed teams of scientists. To accomplish their work, these teams must overcome both disciplinary and geographic barriers. Introducing new training paradigms is one way to facilitate research progress in computational biology. Here, we describe a new undergraduate program in biomolecular structure prediction and design in which students conduct research at labs located at geographically-distributed institutions while remaining connected through an online community. This 10-week summer program begins with one week of training on computational biology methods development, transitions to eight weeks of research, and culminates in one week at the Rosetta annual conference. To date, two cohorts of students have participated, tackling research topics including vaccine design, enzyme design, protein-based materials, glycoprotein modeling, crowd-sourced science, RNA processing, hydrogen bond networks, and amyloid formation. Students in the program report outcomes comparable to students who participate in similar in-person programs. These outcomes include the development of a sense of community and increases in their scientific self-efficacy, scientific identity, and science values, all predictors of continuing in a science research career. Furthermore, the program attracted students from diverse backgrounds, which demonstrates the potential of this approach to broaden the participation of young scientists from backgrounds traditionally underrepresented in computational biology.
Computer Model Predicts the Movement of Dust
NASA Technical Reports Server (NTRS)
2002-01-01
A new computer model of the atmosphere can now actually pinpoint where global dust events come from, and can project where they're going. The model may help scientists better evaluate the impact of dust on human health, climate, ocean carbon cycles, ecosystems, and atmospheric chemistry. Also, by seeing where dust originates and where it blows people with respiratory problems can get advanced warning of approaching dust clouds. 'The model is physically more realistic than previous ones,' said Mian Chin, a co-author of the study and an Earth and atmospheric scientist at Georgia Tech and the Goddard Space Flight Center (GSFC) in Greenbelt, Md. 'It is able to reproduce the short term day-to-day variations and long term inter-annual variations of dust concentrations and distributions that are measured from field experiments and observed from satellites.' The above images show both aerosols measured from space (left) and the movement of aerosols predicted by computer model for the same date (right). For more information, read New Computer Model Tracks and Predicts Paths Of Earth's Dust Images courtesy Paul Giroux, Georgia Tech/NASA Goddard Space Flight Center
Automating CapCom: Pragmatic Operations and Technology Research for Human Exploration of Mars
NASA Technical Reports Server (NTRS)
Clancey, William J.
2003-01-01
During the Apollo program, NASA and the scientific community used terrestrial analog sites for understanding planetary features and for training astronauts to be scientists. More recently, computer scientists and human factors specialists have followed geologists and biologists into the field, learning how science is actually done on expeditions in extreme environments. Research stations have been constructed by the Mars Society in the Arctic and American southwest, providing facilities for hundreds of researchers to investigate how small crews might live and work on Mars. Combining these interests-science, operations, and technology-in Mars analog field expeditions provides tremendous synergy and authenticity to speculations about Mars missions. By relating historical analyses of Apollo and field science, engineers are creating experimental prototypes that provide significant new capabilities, such as a computer system that automates some of the functions of Apollo s CapCom. Thus, analog studies have created a community of practice-a new collaboration between scientists and engineers-so that technology begins with real human needs and works incrementally towards the challenges of the human exploration of Mars.
Singh, Dadabhai T; Trehan, Rahul; Schmidt, Bertil; Bretschneider, Timo
2008-01-01
Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed--a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area. In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution. The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.
Final Report for ALCC Allocation: Predictive Simulation of Complex Flow in Wind Farms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barone, Matthew F.; Ananthan, Shreyas; Churchfield, Matt
This report documents work performed using ALCC computing resources granted under a proposal submitted in February 2016, with the resource allocation period spanning the period July 2016 through June 2017. The award allocation was 10.7 million processor-hours at the National Energy Research Scientific Computing Center. The simulations performed were in support of two projects: the Atmosphere to Electrons (A2e) project, supported by the DOE EERE office; and the Exascale Computing Project (ECP), supported by the DOE Office of Science. The project team for both efforts consists of staff scientists and postdocs from Sandia National Laboratories and the National Renewable Energymore » Laboratory. At the heart of these projects is the open-source computational-fluid-dynamics (CFD) code, Nalu. Nalu solves the low-Mach-number Navier-Stokes equations using an unstructured- grid discretization. Nalu leverages the open-source Trilinos solver library and the Sierra Toolkit (STK) for parallelization and I/O. This report documents baseline computational performance of the Nalu code on problems of direct relevance to the wind plant physics application - namely, Large Eddy Simulation (LES) of an atmospheric boundary layer (ABL) flow and wall-modeled LES of a flow past a static wind turbine rotor blade. Parallel performance of Nalu and its constituent solver routines residing in the Trilinos library has been assessed previously under various campaigns. However, both Nalu and Trilinos have been, and remain, in active development and resources have not been available previously to rigorously track code performance over time. With the initiation of the ECP, it is important to establish and document baseline code performance on the problems of interest. This will allow the project team to identify and target any deficiencies in performance, as well as highlight any performance bottlenecks as we exercise the code on a greater variety of platforms and at larger scales. The current study is rather modest in scale, examining performance on problem sizes of O(100 million) elements and core counts up to 8k cores. This will be expanded as more computational resources become available to the projects.« less
NASA Astrophysics Data System (ADS)
Massmann, J.; Nagel, T.; Bilke, L.; Böttcher, N.; Heusermann, S.; Fischer, T.; Kumar, V.; Schäfers, A.; Shao, H.; Vogel, P.; Wang, W.; Watanabe, N.; Ziefle, G.; Kolditz, O.
2016-12-01
As part of the German site selection process for a high-level nuclear waste repository, different repository concepts in the geological candidate formations rock salt, clay stone and crystalline rock are being discussed. An open assessment of these concepts using numerical simulations requires physical models capturing the individual particularities of each rock type and associated geotechnical barrier concept to a comparable level of sophistication. In a joint work group of the Helmholtz Centre for Environmental Research (UFZ) and the German Federal Institute for Geosciences and Natural Resources (BGR), scientists of the UFZ are developing and implementing multiphysical process models while BGR scientists apply them to large scale analyses. The advances in simulation methods for waste repositories are incorporated into the open-source code OpenGeoSys. Here, recent application-driven progress in this context is highlighted. A robust implementation of visco-plasticity with temperature-dependent properties into a framework for the thermo-mechanical analysis of rock salt will be shown. The model enables the simulation of heat transport along with its consequences on the elastic response as well as on primary and secondary creep or the occurrence of dilatancy in the repository near field. Transverse isotropy, non-isothermal hydraulic processes and their coupling to mechanical stresses are taken into account for the analysis of repositories in clay stone. These processes are also considered in the near field analyses of engineered barrier systems, including the swelling/shrinkage of the bentonite material. The temperature-dependent saturation evolution around the heat-emitting waste container is described by different multiphase flow formulations. For all mentioned applications, we illustrate the workflow from model development and implementation, over verification and validation, to repository-scale application simulations using methods of high performance computing.
Parameter Sweep and Optimization of Loosely Coupled Simulations Using the DAKOTA Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elwasif, Wael R; Bernholdt, David E; Pannala, Sreekanth
2012-01-01
The increasing availability of large scale computing capabilities has accelerated the development of high-fidelity coupled simulations. Such simulations typically involve the integration of models that implement various aspects of the complex phenomena under investigation. Coupled simulations are playing an integral role in fields such as climate modeling, earth systems modeling, rocket simulations, computational chemistry, fusion research, and many other computational fields. Model coupling provides scientists with systematic ways to virtually explore the physical, mathematical, and computational aspects of the problem. Such exploration is rarely done using a single execution of a simulation, but rather by aggregating the results from manymore » simulation runs that, together, serve to bring to light novel knowledge about the system under investigation. Furthermore, it is often the case (particularly in engineering disciplines) that the study of the underlying system takes the form of an optimization regime, where the control parameter space is explored to optimize an objective functions that captures system realizability, cost, performance, or a combination thereof. Novel and flexible frameworks that facilitate the integration of the disparate models into a holistic simulation are used to perform this research, while making efficient use of the available computational resources. In this paper, we describe the integration of the DAKOTA optimization and parameter sweep toolkit with the Integrated Plasma Simulator (IPS), a component-based framework for loosely coupled simulations. The integration allows DAKOTA to exploit the internal task and resource management of the IPS to dynamically instantiate simulation instances within a single IPS instance, allowing for greater control over the trade-off between efficiency of resource utilization and time to completion. We present a case study showing the use of the combined DAKOTA-IPS system to aid in the design of a lithium ion battery (LIB) cell, by studying a coupled system involving the electrochemistry and ion transport at the lower length scales and thermal energy transport at the device scales. The DAKOTA-IPS system provides a flexible tool for use in optimization and parameter sweep studies involving loosely coupled simulations that is suitable for use in situations where changes to the constituent components in the coupled simulation are impractical due to intellectual property or code heritage issues.« less
A systematic identification and analysis of scientists on Twitter.
Ke, Qing; Ahn, Yong-Yeol; Sugimoto, Cassidy R
2017-01-01
Metrics derived from Twitter and other social media-often referred to as altmetrics-are increasingly used to estimate the broader social impacts of scholarship. Such efforts, however, may produce highly misleading results, as the entities that participate in conversations about science on these platforms are largely unknown. For instance, if altmetric activities are generated mainly by scientists, does it really capture broader social impacts of science? Here we present a systematic approach to identifying and analyzing scientists on Twitter. Our method can identify scientists across many disciplines, without relying on external bibliographic data, and be easily adapted to identify other stakeholder groups in science. We investigate the demographics, sharing behaviors, and interconnectivity of the identified scientists. We find that Twitter has been employed by scholars across the disciplinary spectrum, with an over-representation of social and computer and information scientists; under-representation of mathematical, physical, and life scientists; and a better representation of women compared to scholarly publishing. Analysis of the sharing of URLs reveals a distinct imprint of scholarly sites, yet only a small fraction of shared URLs are science-related. We find an assortative mixing with respect to disciplines in the networks between scientists, suggesting the maintenance of disciplinary walls in social media. Our work contributes to the literature both methodologically and conceptually-we provide new methods for disambiguating and identifying particular actors on social media and describing the behaviors of scientists, thus providing foundational information for the construction and use of indicators on the basis of social media metrics.