The philosophy of benchmark testing a standards-based picture archiving and communications system.
Richardson, N E; Thomas, J A; Lyche, D K; Romlein, J; Norton, G S; Dolecek, Q E
1999-05-01
The Department of Defense issued its requirements for a Digital Imaging Network-Picture Archiving and Communications System (DIN-PACS) in a Request for Proposals (RFP) to industry in January 1997, with subsequent contracts being awarded in November 1997 to the Agfa Division of Bayer and IBM Global Government Industry. The Government's technical evaluation process consisted of evaluating a written technical proposal as well as conducting a benchmark test of each proposed system at the vendor's test facility. The purpose of benchmark testing was to evaluate the performance of the fully integrated system in a simulated operational environment. The benchmark test procedures and test equipment were developed through a joint effort between the Government, academic institutions, and private consultants. Herein the authors discuss the resources required and the methods used to benchmark test a standards-based PACS.
Benchmark Testing of a New 56Fe Evaluation for Criticality Safety Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leal, Luiz C; Ivanov, E.
2015-01-01
The SAMMY code was used to evaluate resonance parameters of the 56Fe cross section in the resolved resonance energy range of 0–2 MeV using transmission data, capture, elastic, inelastic, and double differential elastic cross sections. The resonance analysis was performed with the code SAMMY that fits R-matrix resonance parameters using the generalized least-squares technique (Bayes’ theory). The evaluation yielded a set of resonance parameters that reproduced the experimental data very well, along with a resonance parameter covariance matrix for data uncertainty calculations. Benchmark tests were conducted to assess the evaluation performance in benchmark calculations.
Benchmarking expert system tools
NASA Technical Reports Server (NTRS)
Riley, Gary
1988-01-01
As part of its evaluation of new technologies, the Artificial Intelligence Section of the Mission Planning and Analysis Div. at NASA-Johnson has made timing tests of several expert system building tools. Among the production systems tested were Automated Reasoning Tool, several versions of OPS5, and CLIPS (C Language Integrated Production System), an expert system builder developed by the AI section. Also included in the test were a Zetalisp version of the benchmark along with four versions of the benchmark written in Knowledge Engineering Environment, an object oriented, frame based expert system tool. The benchmarks used for testing are studied.
Evaluation of control strategies using an oxidation ditch benchmark.
Abusam, A; Keesman, K J; Spanjers, H; van, Straten G; Meinema, K
2002-01-01
This paper presents validation and implementation results of a benchmark developed for a specific full-scale oxidation ditch wastewater treatment plant. A benchmark is a standard simulation procedure that can be used as a tool in evaluating various control strategies proposed for wastewater treatment plants. It is based on model and performance criteria development. Testing of this benchmark, by comparing benchmark predictions to real measurements of the electrical energy consumptions and amounts of disposed sludge for a specific oxidation ditch WWTP, has shown that it can (reasonably) be used for evaluating the performance of this WWTP. Subsequently, the validated benchmark was then used in evaluating some basic and advanced control strategies. Some of the interesting results obtained are the following: (i) influent flow splitting ratio, between the first and the fourth aerated compartments of the ditch, has no significant effect on the TN concentrations in the effluent, and (ii) for evaluation of long-term control strategies, future benchmarks need to be able to assess settlers' performance.
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)
NASA Technical Reports Server (NTRS)
Ahmad, Nashat N.; Proctor, Fred H.
2011-01-01
The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these benchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
ICSBEP Benchmarks For Nuclear Data Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Briggs, J. Blair
2005-05-24
The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was initiated in 1992 by the United States Department of Energy. The ICSBEP became an official activity of the Organization for Economic Cooperation and Development (OECD) -- Nuclear Energy Agency (NEA) in 1995. Representatives from the United States, United Kingdom, France, Japan, the Russian Federation, Hungary, Republic of Korea, Slovenia, Serbia and Montenegro (formerly Yugoslavia), Kazakhstan, Spain, Israel, Brazil, Poland, and the Czech Republic are now participating. South Africa, India, China, and Germany are considering participation. The purpose of the ICSBEP is to identify, evaluate, verify, and formally document a comprehensive andmore » internationally peer-reviewed set of criticality safety benchmark data. The work of the ICSBEP is published as an OECD handbook entitled ''International Handbook of Evaluated Criticality Safety Benchmark Experiments.'' The 2004 Edition of the Handbook contains benchmark specifications for 3331 critical or subcritical configurations that are intended for use in validation efforts and for testing basic nuclear data. New to the 2004 Edition of the Handbook is a draft criticality alarm / shielding type benchmark that should be finalized in 2005 along with two other similar benchmarks. The Handbook is being used extensively for nuclear data testing and is expected to be a valuable resource for code and data validation and improvement efforts for decades to come. Specific benchmarks that are useful for testing structural materials such as iron, chromium, nickel, and manganese; beryllium; lead; thorium; and 238U are highlighted.« less
OWL2 benchmarking for the evaluation of knowledge based systems.
Khan, Sher Afgun; Qadir, Muhammad Abdul; Abbas, Muhammad Azeem; Afzal, Muhammad Tanvir
2017-01-01
OWL2 semantics are becoming increasingly popular for the real domain applications like Gene engineering and health MIS. The present work identifies the research gap that negligible attention has been paid to the performance evaluation of Knowledge Base Systems (KBS) using OWL2 semantics. To fulfil this identified research gap, an OWL2 benchmark for the evaluation of KBS is proposed. The proposed benchmark addresses the foundational blocks of an ontology benchmark i.e. data schema, workload and performance metrics. The proposed benchmark is tested on memory based, file based, relational database and graph based KBS for performance and scalability measures. The results show that the proposed benchmark is able to evaluate the behaviour of different state of the art KBS on OWL2 semantics. On the basis of the results, the end users (i.e. domain expert) would be able to select a suitable KBS appropriate for his domain.
Bess, John D.; Fujimoto, Nozomu
2014-10-09
Benchmark models were developed to evaluate six cold-critical and two warm-critical, zero-power measurements of the HTTR. Additional measurements of a fully-loaded subcritical configuration, core excess reactivity, shutdown margins, six isothermal temperature coefficients, and axial reaction-rate distributions were also evaluated as acceptable benchmark experiments. Insufficient information is publicly available to develop finely-detailed models of the HTTR as much of the design information is still proprietary. However, the uncertainties in the benchmark models are judged to be of sufficient magnitude to encompass any biases and bias uncertainties incurred through the simplification process used to develop the benchmark models. Dominant uncertainties in themore » experimental keff for all core configurations come from uncertainties in the impurity content of the various graphite blocks that comprise the HTTR. Monte Carlo calculations of keff are between approximately 0.9 % and 2.7 % greater than the benchmark values. Reevaluation of the HTTR models as additional information becomes available could improve the quality of this benchmark and possibly reduce the computational biases. High-quality characterization of graphite impurities would significantly improve the quality of the HTTR benchmark assessment. Simulation of the other reactor physics measurements are in good agreement with the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Contributions to Integral Nuclear Data in ICSBEP and IRPhEP since ND 2013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Briggs, J. Blair; Gulliford, Jim
2016-09-01
The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) was last discussed directly with the international nuclear data community at ND2013. Since ND2013, integral benchmark data that are available for nuclear data testing has continued to increase. The status of the international benchmark efforts and the latest contributions to integral nuclear data for testing is discussed. Select benchmark configurations that have been added to the ICSBEP and IRPhEP Handbooks since ND2013 are highlighted. The 2015 edition of the ICSBEP Handbook now contains 567 evaluations with benchmark specifications for 4,874more » critical, near-critical, or subcritical configurations, 31 criticality alarm placement/shielding configuration with multiple dose points apiece, and 207 configurations that have been categorized as fundamental physics measurements that are relevant to criticality safety applications. The 2015 edition of the IRPhEP Handbook contains data from 143 different experimental series that were performed at 50 different nuclear facilities. Currently 139 of the 143 evaluations are published as approved benchmarks with the remaining four evaluations published in draft format only. Measurements found in the IRPhEP Handbook include criticality, buckling and extrapolation length, spectral characteristics, reactivity effects, reactivity coefficients, kinetics, reaction-rate distributions, power distributions, isotopic compositions, and/or other miscellaneous types of measurements for various types of reactor systems. Annual technical review meetings for both projects were held in April 2016; additional approved benchmark evaluations will be included in the 2016 editions of these handbooks.« less
Alfa, Michelle J; Fatima, Iram; Olson, Nancy
2013-03-01
The study objective was to verify that the adenosine triphosphate (ATP) benchmark of <200 relative light units (RLUs) was achievable in a busy endoscopy clinic that followed the manufacturer's manual cleaning instructions. All channels from patient-used colonoscopes (20) and duodenoscopes (20) in a tertiary care hospital endoscopy clinic were sampled after manual cleaning and tested for residual ATP. The ATP test benchmark for adequate manual cleaning was set at <200 RLUs. The benchmark for protein was <6.4 μg/cm(2), and, for bioburden, it was <4-log10 colony-forming units/cm(2). Our data demonstrated that 96% (115/120) of channels from 20 colonoscopes and 20 duodenoscopes evaluated met the ATP benchmark of <200 RLUs. The 5 channels that exceeded 200 RLUs were all elevator guide-wire channels. All 120 of the manually cleaned endoscopes tested had protein and bioburden levels that were compliant with accepted benchmarks for manual cleaning for suction-biopsy, air-water, and auxiliary water channels. Our data confirmed that, by following the endoscope manufacturer's manual cleaning recommendations, 96% of channels in gastrointestinal endoscopes would have <200 RLUs for the ATP test kit evaluated and would meet the accepted clean benchmarks for protein and bioburden. Copyright © 2013 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Mosby, Inc. All rights reserved.
Using Benchmarking To Strengthen the Assessment of Persistence.
McLachlan, Michael S; Zou, Hongyan; Gouin, Todd
2017-01-03
Chemical persistence is a key property for assessing chemical risk and chemical hazard. Current methods for evaluating persistence are based on laboratory tests. The relationship between the laboratory based estimates and persistence in the environment is often unclear, in which case the current methods for evaluating persistence can be questioned. Chemical benchmarking opens new possibilities to measure persistence in the field. In this paper we explore how the benchmarking approach can be applied in both the laboratory and the field to deepen our understanding of chemical persistence in the environment and create a firmer scientific basis for laboratory to field extrapolation of persistence test results.
Quality Assurance Testing of Version 1.3 of U.S. EPA Benchmark Dose Software (Presentation)
EPA benchmark dose software (BMDS) issued to evaluate chemical dose-response data in support of Agency risk assessments, and must therefore be dependable. Quality assurance testing methods developed for BMDS were designed to assess model dependability with respect to curve-fitt...
Analysis of a benchmark suite to evaluate mixed numeric and symbolic processing
NASA Technical Reports Server (NTRS)
Ragharan, Bharathi; Galant, David
1992-01-01
The suite of programs that formed the benchmark for a proposed advanced computer is described and analyzed. The features of the processor and its operating system that are tested by the benchmark are discussed. The computer codes and the supporting data for the analysis are given as appendices.
A benchmarking method to measure dietary absorption efficiency of chemicals by fish.
Xiao, Ruiyang; Adolfsson-Erici, Margaretha; Åkerman, Gun; McLachlan, Michael S; MacLeod, Matthew
2013-12-01
Understanding the dietary absorption efficiency of chemicals in the gastrointestinal tract of fish is important from both a scientific and a regulatory point of view. However, reported fish absorption efficiencies for well-studied chemicals are highly variable. In the present study, the authors developed and exploited an internal chemical benchmarking method that has the potential to reduce uncertainty and variability and, thus, to improve the precision of measurements of fish absorption efficiency. The authors applied the benchmarking method to measure the gross absorption efficiency for 15 chemicals with a wide range of physicochemical properties and structures. They selected 2,2',5,6'-tetrachlorobiphenyl (PCB53) and decabromodiphenyl ethane as absorbable and nonabsorbable benchmarks, respectively. Quantities of chemicals determined in fish were benchmarked to the fraction of PCB53 recovered in fish, and quantities of chemicals determined in feces were benchmarked to the fraction of decabromodiphenyl ethane recovered in feces. The performance of the benchmarking procedure was evaluated based on the recovery of the test chemicals and precision of absorption efficiency from repeated tests. Benchmarking did not improve the precision of the measurements; after benchmarking, however, the median recovery for 15 chemicals was 106%, and variability of recoveries was reduced compared with before benchmarking, suggesting that benchmarking could account for incomplete extraction of chemical in fish and incomplete collection of feces from different tests. © 2013 SETAC.
Signorelli, Heather; Straseski, Joely A; Genzen, Jonathan R; Walker, Brandon S; Jackson, Brian R; Schmidt, Robert L
2015-01-01
Appropriate test utilization is usually evaluated by adherence to published guidelines. In many cases, medical guidelines are not available. Benchmarking has been proposed as a method to identify practice variations that may represent inappropriate testing. This study investigated the use of benchmarking to identify sites with inappropriate utilization of testing for a particular analyte. We used a Web-based survey to compare 2 measures of vitamin D utilization: overall testing intensity (ratio of total vitamin D orders to blood-count orders) and relative testing intensity (ratio of 1,25(OH)2D to 25(OH)D test orders). A total of 81 facilities contributed data. The average overall testing intensity index was 0.165, or approximately 1 vitamin D test for every 6 blood-count tests. The average relative testing intensity index was 0.055, or one 1,25(OH)2D test for every 18 of the 25(OH)D tests. Both indexes varied considerably. Benchmarking can be used as a screening tool to identify outliers that may be associated with inappropriate test utilization. Copyright© by the American Society for Clinical Pathology (ASCP).
Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Pedretti, Kevin T.; Kutler, Paul (Technical Monitor)
1997-01-01
We evaluate the performance of a Fast Ethernet network configured with a single large switch, a single hub, and a 4x4 2D torus topology in a testbed cluster of "commodity" Pentium Pro PCs. We also evaluated a mixed network composed of ethernet hubs and switches. An MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2) show that the torus network performs best for all sizes that we were able to test (up to 16 nodes). For larger networks the ethernet switch outperforms the hub, though its performance is far less than peak. The hub/switch combination tests indicate that the NAS parallel benchmarks are relatively insensitive to hub densities of less than 7 nodes per hub.
Simulation of Benchmark Cases with the Terminal Area Simulation System (TASS)
NASA Technical Reports Server (NTRS)
Ahmad, Nash'at; Proctor, Fred
2011-01-01
The hydrodynamic core of the Terminal Area Simulation System (TASS) is evaluated against different benchmark cases. In the absence of closed form solutions for the equations governing atmospheric flows, the models are usually evaluated against idealized test cases. Over the years, various authors have suggested a suite of these idealized cases which have become standards for testing and evaluating the dynamics and thermodynamics of atmospheric flow models. In this paper, simulations of three such cases are described. In addition, the TASS model is evaluated against a test case that uses an exact solution of the Navier-Stokes equations. The TASS results are compared against previously reported simulations of these banchmark cases in the literature. It is demonstrated that the TASS model is highly accurate, stable and robust.
INTEGRAL BENCHMARK DATA FOR NUCLEAR DATA TESTING THROUGH THE ICSBEP AND THE NEWLY ORGANIZED IRPHEP
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. Blair Briggs; Lori Scott; Yolanda Rugama
The status of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) was last reported in a nuclear data conference at the International Conference on Nuclear Data for Science and Technology, ND-2004, in Santa Fe, New Mexico. Since that time the number and type of integral benchmarks have increased significantly. Included in the ICSBEP Handbook are criticality-alarm / shielding and fundamental physic benchmarks in addition to the traditional critical / subcritical benchmark data. Since ND 2004, a reactor physics counterpart to the ICSBEP, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was initiated. The IRPhEP is patterned after the ICSBEP, butmore » focuses on other integral measurements, such as buckling, spectral characteristics, reactivity effects, reactivity coefficients, kinetics measurements, reaction-rate and power distributions, nuclide compositions, and other miscellaneous-type measurements in addition to the critical configuration. The status of these two projects is discussed and selected benchmarks highlighted in this paper.« less
RESULTS OF QA/QC TESTING OF EPA BENCHMARK DOSE SOFTWARE VERSION 1.2
EPA is developing benchmark dose software (BMDS) to support cancer and non-cancer dose-response assessments. Following the recent public review of BMDS version 1.1b, EPA developed a Hill model for evaluating continuous data, and improved the user interface and Multistage, Polyno...
StirMark Benchmark: audio watermarking attacks based on lossy compression
NASA Astrophysics Data System (ADS)
Steinebach, Martin; Lang, Andreas; Dittmann, Jana
2002-04-01
StirMark Benchmark is a well-known evaluation tool for watermarking robustness. Additional attacks are added to it continuously. To enable application based evaluation, in our paper we address attacks against audio watermarks based on lossy audio compression algorithms to be included in the test environment. We discuss the effect of different lossy compression algorithms like MPEG-2 audio Layer 3, Ogg or VQF on a selection of audio test data. Our focus is on changes regarding the basic characteristics of the audio data like spectrum or average power and on removal of embedded watermarks. Furthermore we compare results of different watermarking algorithms and show that lossy compression is still a challenge for most of them. There are two strategies for adding evaluation of robustness against lossy compression to StirMark Benchmark: (a) use of existing free compression algorithms (b) implementation of a generic lossy compression simulation. We discuss how such a model can be implemented based on the results of our tests. This method is less complex, as no real psycho acoustic model has to be applied. Our model can be used for audio watermarking evaluation of numerous application fields. As an example, we describe its importance for e-commerce applications with watermarking security.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
Benchmarking and Hardware-In-The-Loop Operation of a ...
Engine Performance evaluation in support of LD MTE. EPA used elements of its ALPHA model to apply hardware-in-the-loop (HIL) controls to the SKYACTIV engine test setup to better understand how the engine would operate in a chassis test after combined with future leading edge technologies, advanced high-efficiency transmission, reduced mass, and reduced roadload. Predict future vehicle performance with Atkinson engine. As part of its technology assessment for the upcoming midterm evaluation of the 2017-2025 LD vehicle GHG emissions regulation, EPA has been benchmarking engines and transmissions to generate inputs for use in its ALPHA model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Risner, J.M.; Wiarda, D.; Miller, T.M.
2011-07-01
The U.S. Nuclear Regulatory Commission's Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the evaluated nuclear data file (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI.3 data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII.0. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96 libraries.more » Verification and validation of the new libraries were accomplished using diagnostic checks in AMPX, 'unit tests' for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in RPV fluence calculations and meet the calculational uncertainty criterion in Regulatory Guide 1.190. (authors)« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Risner, Joel M; Wiarda, Dorothea; Miller, Thomas Martin
2011-01-01
The U.S. Nuclear Regulatory Commission s Regulatory Guide 1.190 states that calculational methods used to estimate reactor pressure vessel (RPV) fluence should use the latest version of the Evaluated Nuclear Data File (ENDF). The VITAMIN-B6 fine-group library and BUGLE-96 broad-group library, which are widely used for RPV fluence calculations, were generated using ENDF/B-VI data, which was the most current data when Regulatory Guide 1.190 was issued. We have developed new fine-group (VITAMIN-B7) and broad-group (BUGLE-B7) libraries based on ENDF/B-VII. These new libraries, which were processed using the AMPX code system, maintain the same group structures as the VITAMIN-B6 and BUGLE-96more » libraries. Verification and validation of the new libraries was accomplished using diagnostic checks in AMPX, unit tests for each element in VITAMIN-B7, and a diverse set of benchmark experiments including critical evaluations for fast and thermal systems, a set of experimental benchmarks that are used for SCALE regression tests, and three RPV fluence benchmarks. The benchmark evaluation results demonstrate that VITAMIN-B7 and BUGLE-B7 are appropriate for use in LWR shielding applications, and meet the calculational uncertainty criterion in Regulatory Guide 1.190.« less
Benchmarking short sequence mapping tools
2013-01-01
Background The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison. Results We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others. Conclusion The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results. PMID:23758764
Benchmarking for maximum value.
Baldwin, Ed
2009-03-01
Speaking at the most recent Healthcare Estates conference, Ed Baldwin, of international built asset consultancy EC Harris LLP, examined the role of benchmarking and market-testing--two of the key methods used to evaluate the quality and cost-effectiveness of hard and soft FM services provided under PFI healthcare schemes to ensure they are offering maximum value for money.
Karim, Rashed; Bhagirath, Pranav; Claus, Piet; James Housden, R; Chen, Zhong; Karimaghaloo, Zahra; Sohn, Hyon-Mok; Lara Rodríguez, Laura; Vera, Sergio; Albà, Xènia; Hennemuth, Anja; Peitgen, Heinz-Otto; Arbel, Tal; Gonzàlez Ballester, Miguel A; Frangi, Alejandro F; Götte, Marco; Razavi, Reza; Schaeffter, Tobias; Rhode, Kawal
2016-05-01
Studies have demonstrated the feasibility of late Gadolinium enhancement (LGE) cardiovascular magnetic resonance (CMR) imaging for guiding the management of patients with sequelae to myocardial infarction, such as ventricular tachycardia and heart failure. Clinical implementation of these developments necessitates a reproducible and reliable segmentation of the infarcted regions. It is challenging to compare new algorithms for infarct segmentation in the left ventricle (LV) with existing algorithms. Benchmarking datasets with evaluation strategies are much needed to facilitate comparison. This manuscript presents a benchmarking evaluation framework for future algorithms that segment infarct from LGE CMR of the LV. The image database consists of 30 LGE CMR images of both humans and pigs that were acquired from two separate imaging centres. A consensus ground truth was obtained for all data using maximum likelihood estimation. Six widely-used fixed-thresholding methods and five recently developed algorithms are tested on the benchmarking framework. Results demonstrate that the algorithms have better overlap with the consensus ground truth than most of the n-SD fixed-thresholding methods, with the exception of the Full-Width-at-Half-Maximum (FWHM) fixed-thresholding method. Some of the pitfalls of fixed thresholding methods are demonstrated in this work. The benchmarking evaluation framework, which is a contribution of this work, can be used to test and benchmark future algorithms that detect and quantify infarct in LGE CMR images of the LV. The datasets, ground truth and evaluation code have been made publicly available through the website: https://www.cardiacatlas.org/web/guest/challenges. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Benchmarking high performance computing architectures with CMS’ skeleton framework
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-11-23
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
NASA Technical Reports Server (NTRS)
Rivera, Jose A., Jr.; Dansberry, Bryan E.; Farmer, Moses G.; Eckstrom, Clinton V.; Seidel, David A.; Bennett, Robert M.
1991-01-01
The Structural Dynamics Div. at NASA-Langley has started a wind tunnel activity referred to as the Benchmark Models Program. The objective is to acquire test data that will be useful for developing and evaluating aeroelastic type Computational Fluid Dynamics codes currently in use or under development. The progress is described which was achieved in testing the first model in the Benchmark Models Program. Experimental flutter boundaries are presented for a rigid semispan model (NACA 0012 airfoil section) mounted on a flexible mount system. Also, steady and unsteady pressure measurements taken at the flutter condition are presented. The pressure data were acquired over the entire model chord located at the 60 pct. span station.
Radiation Detection Computational Benchmark Scenarios
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shaver, Mark W.; Casella, Andrew M.; Wittman, Richard S.
2013-09-24
Modeling forms an important component of radiation detection development, allowing for testing of new detector designs, evaluation of existing equipment against a wide variety of potential threat sources, and assessing operation performance of radiation detection systems. This can, however, result in large and complex scenarios which are time consuming to model. A variety of approaches to radiation transport modeling exist with complementary strengths and weaknesses for different problems. This variety of approaches, and the development of promising new tools (such as ORNL’s ADVANTG) which combine benefits of multiple approaches, illustrates the need for a means of evaluating or comparing differentmore » techniques for radiation detection problems. This report presents a set of 9 benchmark problems for comparing different types of radiation transport calculations, identifying appropriate tools for classes of problems, and testing and guiding the development of new methods. The benchmarks were drawn primarily from existing or previous calculations with a preference for scenarios which include experimental data, or otherwise have results with a high level of confidence, are non-sensitive, and represent problem sets of interest to NA-22. From a technical perspective, the benchmarks were chosen to span a range of difficulty and to include gamma transport, neutron transport, or both and represent different important physical processes and a range of sensitivity to angular or energy fidelity. Following benchmark identification, existing information about geometry, measurements, and previous calculations were assembled. Monte Carlo results (MCNP decks) were reviewed or created and re-run in order to attain accurate computational times and to verify agreement with experimental data, when present. Benchmark information was then conveyed to ORNL in order to guide testing and development of hybrid calculations. The results of those ADVANTG calculations were then sent to PNNL for compilation. This is a report describing the details of the selected Benchmarks and results from various transport codes.« less
Space network scheduling benchmark: A proof-of-concept process for technology transfer
NASA Technical Reports Server (NTRS)
Moe, Karen; Happell, Nadine; Hayden, B. J.; Barclay, Cathy
1993-01-01
This paper describes a detailed proof-of-concept activity to evaluate flexible scheduling technology as implemented in the Request Oriented Scheduling Engine (ROSE) and applied to Space Network (SN) scheduling. The criteria developed for an operational evaluation of a reusable scheduling system is addressed including a methodology to prove that the proposed system performs at least as well as the current system in function and performance. The improvement of the new technology must be demonstrated and evaluated against the cost of making changes. Finally, there is a need to show significant improvement in SN operational procedures. Successful completion of a proof-of-concept would eventually lead to an operational concept and implementation transition plan, which is outside the scope of this paper. However, a high-fidelity benchmark using actual SN scheduling requests has been designed to test the ROSE scheduling tool. The benchmark evaluation methodology, scheduling data, and preliminary results are described.
Aircraft Engine Gas Path Diagnostic Methods: Public Benchmarking Results
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Borguet, Sebastien; Leonard, Olivier; Zhang, Xiaodong (Frank)
2013-01-01
Recent technology reviews have identified the need for objective assessments of aircraft engine health management (EHM) technologies. To help address this issue, a gas path diagnostic benchmark problem has been created and made publicly available. This software tool, referred to as the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES), has been constructed based on feedback provided by the aircraft EHM community. It provides a standard benchmark problem enabling users to develop, evaluate and compare diagnostic methods. This paper will present an overview of ProDiMES along with a description of four gas path diagnostic methods developed and applied to the problem. These methods, which include analytical and empirical diagnostic techniques, will be described and associated blind-test-case metric results will be presented and compared. Lessons learned along with recommendations for improving the public benchmarking processes will also be presented and discussed.
Benchmarking infrastructure for mutation text mining
2014-01-01
Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.
Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo
2014-02-25
Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Benchmark and Framework for Encouraging Research on Multi-Threaded Testing Tools
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Stoller, Scott D.; Ur, Shmuel
2003-01-01
A problem that has been getting prominence in testing is that of looking for intermittent bugs. Multi-threaded code is becoming very common, mostly on the server side. As there is no silver bullet solution, research focuses on a variety of partial solutions. In this paper (invited by PADTAD 2003) we outline a proposed project to facilitate research. The project goals are as follows. The first goal is to create a benchmark that can be used to evaluate different solutions. The benchmark, apart from containing programs with documented bugs, will include other artifacts, such as traces, that are useful for evaluating some of the technologies. The second goal is to create a set of tools with open API s that can be used to check ideas without building a large system. For example an instrumentor will be available, that could be used to test temporal noise making heuristics. The third goal is to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques, could congregate.
Benchmarking Diagnostic Algorithms on an Electrical Power System Testbed
NASA Technical Reports Server (NTRS)
Kurtoglu, Tolga; Narasimhan, Sriram; Poll, Scott; Garcia, David; Wright, Stephanie
2009-01-01
Diagnostic algorithms (DAs) are key to enabling automated health management. These algorithms are designed to detect and isolate anomalies of either a component or the whole system based on observations received from sensors. In recent years a wide range of algorithms, both model-based and data-driven, have been developed to increase autonomy and improve system reliability and affordability. However, the lack of support to perform systematic benchmarking of these algorithms continues to create barriers for effective development and deployment of diagnostic technologies. In this paper, we present our efforts to benchmark a set of DAs on a common platform using a framework that was developed to evaluate and compare various performance metrics for diagnostic technologies. The diagnosed system is an electrical power system, namely the Advanced Diagnostics and Prognostics Testbed (ADAPT) developed and located at the NASA Ames Research Center. The paper presents the fundamentals of the benchmarking framework, the ADAPT system, description of faults and data sets, the metrics used for evaluation, and an in-depth analysis of benchmarking results obtained from testing ten diagnostic algorithms on the ADAPT electrical power system testbed.
NASA Technical Reports Server (NTRS)
Krause, David L.; Brewer, Ethan J.; Pawlik, Ralph
2013-01-01
This report provides test methodology details and qualitative results for the first structural benchmark creep test of an Advanced Stirling Convertor (ASC) heater head of ASC-E2 design heritage. The test article was recovered from a flight-like Microcast MarM-247 heater head specimen previously used in helium permeability testing. The test article was utilized for benchmark creep test rig preparation, wall thickness and diametral laser scan hardware metrological developments, and induction heater custom coil experiments. In addition, a benchmark creep test was performed, terminated after one week when through-thickness cracks propagated at thermocouple weld locations. Following this, it was used to develop a unique temperature measurement methodology using contact thermocouples, thereby enabling future benchmark testing to be performed without the use of conventional welded thermocouples, proven problematic for the alloy. This report includes an overview of heater head structural benchmark creep testing, the origin of this particular test article, test configuration developments accomplished using the test article, creep predictions for its benchmark creep test, qualitative structural benchmark creep test results, and a short summary.
Performance Monitoring of Distributed Data Processing Systems
NASA Technical Reports Server (NTRS)
Ojha, Anand K.
2000-01-01
Test and checkout systems are essential components in ensuring safety and reliability of aircraft and related systems for space missions. A variety of systems, developed over several years, are in use at the NASA/KSC. Many of these systems are configured as distributed data processing systems with the functionality spread over several multiprocessor nodes interconnected through networks. To be cost-effective, a system should take the least amount of resource and perform a given testing task in the least amount of time. There are two aspects of performance evaluation: monitoring and benchmarking. While monitoring is valuable to system administrators in operating and maintaining, benchmarking is important in designing and upgrading computer-based systems. These two aspects of performance evaluation are the foci of this project. This paper first discusses various issues related to software, hardware, and hybrid performance monitoring as applicable to distributed systems, and specifically to the TCMS (Test Control and Monitoring System). Next, a comparison of several probing instructions are made to show that the hybrid monitoring technique developed by the NIST (National Institutes for Standards and Technology) is the least intrusive and takes only one-fourth of the time taken by software monitoring probes. In the rest of the paper, issues related to benchmarking a distributed system have been discussed and finally a prescription for developing a micro-benchmark for the TCMS has been provided.
NDEC: A NEA platform for nuclear data testing, verification and benchmarking
NASA Astrophysics Data System (ADS)
Díez, C. J.; Michel-Sendis, F.; Cabellos, O.; Bossant, M.; Soppera, N.
2017-09-01
The selection, testing, verification and benchmarking of evaluated nuclear data consists, in practice, in putting an evaluated file through a number of checking steps where different computational codes verify that the file and the data it contains complies with different requirements. These requirements range from format compliance to good performance in application cases, while at the same time physical constraints and the agreement with experimental data are verified. At NEA, the NDEC (Nuclear Data Evaluation Cycle) platform aims at providing, in a user friendly interface, a thorough diagnose of the quality of a submitted evaluated nuclear data file. Such diagnose is based on the results of different computational codes and routines which carry out the mentioned verifications, tests and checks. NDEC also searches synergies with other existing NEA tools and databases, such as JANIS, DICE or NDaST, including them into its working scheme. Hence, this paper presents NDEC, its current development status and its usage in the JEFF nuclear data project.
ERIC Educational Resources Information Center
Harrington, Shanika
2017-01-01
The purpose of this research study was to evaluate the impact of the district's use of the Fountas and Pinnell Benchmark Assessment System on 3rd grade students' reading achievement as measured by the SC READY ELA test. Educators are increasingly using assessment data in determining students' knowledge and progress. Brady, 2011 stated that…
Development and application of freshwater sediment-toxicity benchmarks for currently used pesticides
Nowell, Lisa H.; Norman, Julia E.; Ingersoll, Christopher G.; Moran, Patrick W.
2016-01-01
Sediment-toxicity benchmarks are needed to interpret the biological significance of currently used pesticides detected in whole sediments. Two types of freshwater sediment benchmarks for pesticides were developed using spiked-sediment bioassay (SSB) data from the literature. These benchmarks can be used to interpret sediment-toxicity data or to assess the potential toxicity of pesticides in whole sediment. The Likely Effect Benchmark (LEB) defines a pesticide concentration in whole sediment above which there is a high probability of adverse effects on benthic invertebrates, and the Threshold Effect Benchmark (TEB) defines a concentration below which adverse effects are unlikely. For compounds without available SSBs, benchmarks were estimated using equilibrium partitioning (EqP). When a sediment sample contains a pesticide mixture, benchmark quotients can be summed for all detected pesticides to produce an indicator of potential toxicity for that mixture. Benchmarks were developed for 48 pesticide compounds using SSB data and 81 compounds using the EqP approach. In an example application, data for pesticides measured in sediment from 197 streams across the United States were evaluated using these benchmarks, and compared to measured toxicity from whole-sediment toxicity tests conducted with the amphipod Hyalella azteca (28-d exposures) and the midge Chironomus dilutus (10-d exposures). Amphipod survival, weight, and biomass were significantly and inversely related to summed benchmark quotients, whereas midge survival, weight, and biomass showed no relationship to benchmarks. Samples with LEB exceedances were rare (n = 3), but all were toxic to amphipods (i.e., significantly different from control). Significant toxicity to amphipods was observed for 72% of samples exceeding one or more TEBs, compared to 18% of samples below all TEBs. Factors affecting toxicity below TEBs may include the presence of contaminants other than pesticides, physical/chemical characteristics of sediment, and uncertainty in TEB values. Additional evaluations of benchmarks in relation to sediment chemistry and toxicity are ongoing.
Validating Cellular Automata Lava Flow Emplacement Algorithms with Standard Benchmarks
NASA Astrophysics Data System (ADS)
Richardson, J. A.; Connor, L.; Charbonnier, S. J.; Connor, C.; Gallant, E.
2015-12-01
A major existing need in assessing lava flow simulators is a common set of validation benchmark tests. We propose three levels of benchmarks which test model output against increasingly complex standards. First, imulated lava flows should be morphologically identical, given changes in parameter space that should be inconsequential, such as slope direction. Second, lava flows simulated in simple parameter spaces can be tested against analytical solutions or empirical relationships seen in Bingham fluids. For instance, a lava flow simulated on a flat surface should produce a circular outline. Third, lava flows simulated over real world topography can be compared to recent real world lava flows, such as those at Tolbachik, Russia, and Fogo, Cape Verde. Success or failure of emplacement algorithms in these validation benchmarks can be determined using a Bayesian approach, which directly tests the ability of an emplacement algorithm to correctly forecast lava inundation. Here we focus on two posterior metrics, P(A|B) and P(¬A|¬B), which describe the positive and negative predictive value of flow algorithms. This is an improvement on less direct statistics such as model sensitivity and the Jaccard fitness coefficient. We have performed these validation benchmarks on a new, modular lava flow emplacement simulator that we have developed. This simulator, which we call MOLASSES, follows a Cellular Automata (CA) method. The code is developed in several interchangeable modules, which enables quick modification of the distribution algorithm from cell locations to their neighbors. By assessing several different distribution schemes with the benchmark tests, we have improved the performance of MOLASSES to correctly match early stages of the 2012-3 Tolbachik Flow, Kamchakta Russia, to 80%. We also can evaluate model performance given uncertain input parameters using a Monte Carlo setup. This illuminates sensitivity to model uncertainty.
Medical school benchmarking - from tools to programmes.
Wilkinson, Tim J; Hudson, Judith N; Mccoll, Geoffrey J; Hu, Wendy C Y; Jolly, Brian C; Schuwirth, Lambert W T
2015-02-01
Benchmarking among medical schools is essential, but may result in unwanted effects. To apply a conceptual framework to selected benchmarking activities of medical schools. We present an analogy between the effects of assessment on student learning and the effects of benchmarking on medical school educational activities. A framework by which benchmarking can be evaluated was developed and applied to key current benchmarking activities in Australia and New Zealand. The analogy generated a conceptual framework that tested five questions to be considered in relation to benchmarking: what is the purpose? what are the attributes of value? what are the best tools to assess the attributes of value? what happens to the results? and, what is the likely "institutional impact" of the results? If the activities were compared against a blueprint of desirable medical graduate outcomes, notable omissions would emerge. Medical schools should benchmark their performance on a range of educational activities to ensure quality improvement and to assure stakeholders that standards are being met. Although benchmarking potentially has positive benefits, it could also result in perverse incentives with unforeseen and detrimental effects on learning if it is undertaken using only a few selected assessment tools.
Benchmarking and performance analysis of the CM-2. [SIMD computer
NASA Technical Reports Server (NTRS)
Myers, David W.; Adams, George B., II
1988-01-01
A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.
Analyzing the BBOB results by means of benchmarking concepts.
Mersmann, O; Preuss, M; Trautmann, H; Bischl, B; Weihs, C
2015-01-01
We present methods to answer two basic questions that arise when benchmarking optimization algorithms. The first one is: which algorithm is the "best" one? and the second one is: which algorithm should I use for my real-world problem? Both are connected and neither is easy to answer. We present a theoretical framework for designing and analyzing the raw data of such benchmark experiments. This represents a first step in answering the aforementioned questions. The 2009 and 2010 BBOB benchmark results are analyzed by means of this framework and we derive insight regarding the answers to the two questions. Furthermore, we discuss how to properly aggregate rankings from algorithm evaluations on individual problems into a consensus, its theoretical background and which common pitfalls should be avoided. Finally, we address the grouping of test problems into sets with similar optimizer rankings and investigate whether these are reflected by already proposed test problem characteristics, finding that this is not always the case.
De Hertogh, Benoît; De Meulder, Bertrand; Berger, Fabrice; Pierre, Michael; Bareke, Eric; Gaigneaux, Anthoula; Depiereux, Eric
2010-01-11
Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously.We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/.
Polarization Control with Piezoelectric and LiNbO3 Transducers
NASA Astrophysics Data System (ADS)
Bradley, E.; Miles, E.; Loginov, B.; Vu, N.
Several Polarization control transducers have appeared on the market, and now automated, endless polarization control systems using these transducers are becoming available. Unfortunately it is not entirely clear what benchmark performance tests a polarization control system must pass, and the polarization disturbances a system must handle are open to some debate. We present quantitative measurements of realistic polarization disturbances and two benchmark tests we have successfully used to evaluate the performance of an automated, endless polarization control system. We use these tests to compare the performance of a system using piezoelectric transducers to that of a system using LiNbO3 transducers.
Present Status and Extensions of the Monte Carlo Performance Benchmark
NASA Astrophysics Data System (ADS)
Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.
2014-06-01
The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
Aluminum Data Measurements and Evaluation for Criticality Safety Applications
NASA Astrophysics Data System (ADS)
Leal, L. C.; Guber, K. H.; Spencer, R. R.; Derrien, H.; Wright, R. Q.
2002-12-01
The Defense Nuclear Facility Safety Board (DNFSB) Recommendation 93-2 motivated the US Department of Energy (DOE) to develop a comprehensive criticality safety program to maintain and to predict the criticality of systems throughout the DOE complex. To implement the response to the DNFSB Recommendation 93-2, a Nuclear Criticality Safety Program (NCSP) was created including the following tasks: Critical Experiments, Criticality Benchmarks, Training, Analytical Methods, and Nuclear Data. The Nuclear Data portion of the NCSP consists of a variety of differential measurements performed at the Oak Ridge Electron Linear Accelerator (ORELA) at the Oak Ridge National Laboratory (ORNL), data analysis and evaluation using the generalized least-squares fitting code SAMMY in the resolved, unresolved, and high energy ranges, and the development and benchmark testing of complete evaluations for a nuclide for inclusion into the Evaluated Nuclear Data File (ENDF/B). This paper outlines the work performed at ORNL to measure, evaluate, and test the nuclear data for aluminum for applications in criticality safety problems.
Freeman, Karoline; Tsertsvadze, Alexander; Taylor-Phillips, Sian; McCarthy, Noel; Mistry, Hema; Manuel, Rohini; Mason, James
2017-01-01
Multiplex gastrointestinal pathogen panel (GPP) tests simultaneously identify bacterial, viral and parasitic pathogens from the stool samples of patients with suspected infectious gastroenteritis presenting in hospital or the community. We undertook a systematic review to compare the accuracy of GPP tests with standard microbiology techniques. Searches in Medline, Embase, Web of Science and the Cochrane library were undertaken from inception to January 2016. Eligible studies compared GPP tests with standard microbiology techniques in patients with suspected gastroenteritis. Quality assessment of included studies used tailored QUADAS-2. In the absence of a reference standard we analysed test performance taking GPP tests and standard microbiology techniques in turn as the benchmark test, using random effects meta-analysis of proportions. No study provided an adequate reference standard with which to compare the test accuracy of GPP and conventional tests. Ten studies informed a meta-analysis of positive and negative agreement. Positive agreement across all pathogens was 0.93 (95% CI 0.90 to 0.96) when conventional methods were the benchmark and 0.68 (95% CI: 0.58 to 0.77) when GPP provided the benchmark. Negative agreement was high in both instances due to the high proportion of negative cases. GPP testing produced a greater number of pathogen-positive findings than conventional testing. It is unclear whether these additional 'positives' are clinically important. GPP testing has the potential to simplify testing and accelerate reporting when compared to conventional microbiology methods. However the impact of GPP testing upon the management, treatment and outcome of patients is poorly understood and further studies are needed to evaluate the health economic impact of GPP testing compared with standard methods. The review protocol is registered with PROSPERO as CRD42016033320.
Achievement Testing in the No Child Left Behind Era: The Arkansas Benchmark
ERIC Educational Resources Information Center
Hall, John D.; Howerton, D. Lynn; Jones, Craig H.
2008-01-01
The No Child Left Behind Act and the accountability movement in public education caused many states to develop criterion-referenced academic achievement tests. Scores from these tests are often used to make high stakes decisions. Even so, these tests typically do not receive independent psychometric scrutiny. We evaluated the 2005 Arkansas…
Jimenez-Del-Toro, Oscar; Muller, Henning; Krenn, Markus; Gruenberg, Katharina; Taha, Abdel Aziz; Winterstein, Marianne; Eggel, Ivan; Foncubierta-Rodriguez, Antonio; Goksel, Orcun; Jakab, Andras; Kontokotsios, Georgios; Langs, Georg; Menze, Bjoern H; Salas Fernandez, Tomas; Schaer, Roger; Walleyo, Anna; Weber, Marc-Andre; Dicente Cid, Yashin; Gass, Tobias; Heinrich, Mattias; Jia, Fucang; Kahl, Fredrik; Kechichian, Razmig; Mai, Dominic; Spanier, Assaf B; Vincent, Graham; Wang, Chunliang; Wyeth, Daniel; Hanbury, Allan
2016-11-01
Variations in the shape and appearance of anatomical structures in medical images are often relevant radiological signs of disease. Automatic tools can help automate parts of this manual process. A cloud-based evaluation framework is presented in this paper including results of benchmarking current state-of-the-art medical imaging algorithms for anatomical structure segmentation and landmark detection: the VISCERAL Anatomy benchmarks. The algorithms are implemented in virtual machines in the cloud where participants can only access the training data and can be run privately by the benchmark administrators to objectively compare their performance in an unseen common test set. Overall, 120 computed tomography and magnetic resonance patient volumes were manually annotated to create a standard Gold Corpus containing a total of 1295 structures and 1760 landmarks. Ten participants contributed with automatic algorithms for the organ segmentation task, and three for the landmark localization task. Different algorithms obtained the best scores in the four available imaging modalities and for subsets of anatomical structures. The annotation framework, resulting data set, evaluation setup, results and performance analysis from the three VISCERAL Anatomy benchmarks are presented in this article. Both the VISCERAL data set and Silver Corpus generated with the fusion of the participant algorithms on a larger set of non-manually-annotated medical images are available to the research community.
BACT Simulation User Guide (Version 7.0)
NASA Technical Reports Server (NTRS)
Waszak, Martin R.
1997-01-01
This report documents the structure and operation of a simulation model of the Benchmark Active Control Technology (BACT) Wind-Tunnel Model. The BACT system was designed, built, and tested at NASA Langley Research Center as part of the Benchmark Models Program and was developed to perform wind-tunnel experiments to obtain benchmark quality data to validate computational fluid dynamics and computational aeroelasticity codes, to verify the accuracy of current aeroservoelasticity design and analysis tools, and to provide an active controls testbed for evaluating new and innovative control algorithms for flutter suppression and gust load alleviation. The BACT system has been especially valuable as a control system testbed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Briggs, J. Blair; Ivanova, Tatiana
2017-02-01
In the past several decades, numerous experiments have been performed worldwide to support reactor operations, measurements, design, and nuclear safety. Those experiments represent an extensive international investment in infrastructure, expertise, and cost, representing significantly valuable resources of data supporting past, current, and future research activities. Those valuable assets represent the basis for recording, development, and validation of our nuclear methods and integral nuclear data [1]. The loss of these experimental data, which has occurred all too much in the recent years, is tragic. The high cost to repeat many of these measurements can be prohibitive, if not impossible, to surmount.more » Two international projects were developed, and are under the direction of the Organisation for Co-operation and Development Nuclear Energy Agency (OECD NEA) to address the challenges of not just data preservation, but evaluation of the data to determine its merit for modern and future use. The International Criticality Safety Benchmark Evaluation Project (ICSBEP) was established to identify and verify comprehensive critical benchmark data sets; evaluate the data, including quantification of biases and uncertainties; compile the data and calculations in a standardized format; and formally document the effort into a single source of verified benchmark data [2]. Similarly, the International Reactor Physics Experiment Evaluation Project (IRPhEP) was established to preserve integral reactor physics experimental data, including separate or special effects data for nuclear energy and technology applications [3]. Annually, contributors from around the world continue to collaborate in the evaluation and review of select benchmark experiments for preservation and dissemination. The extensively peer-reviewed integral benchmark data can then be utilized to support nuclear design and safety analysts to validate the analytical tools, methods, and data needed for next-generation reactor design, safety analysis requirements, and all other front- and back-end activities contributing to the overall nuclear fuel cycle where quality neutronics calculations are paramount.« less
Benchmark Testing of the Largest Titanium Aluminide Sheet Subelement Conducted
NASA Technical Reports Server (NTRS)
Bartolotta, Paul A.; Krause, David L.
2000-01-01
To evaluate wrought titanium aluminide (gamma TiAl) as a viable candidate material for the High-Speed Civil Transport (HSCT) exhaust nozzle, an international team led by the NASA Glenn Research Center at Lewis Field successfully fabricated and tested the largest gamma TiAl sheet structure ever manufactured. The gamma TiAl sheet structure, a 56-percent subscale divergent flap subelement, was fabricated for benchmark testing in three-point bending. Overall, the subelement was 84-cm (33-in.) long by 13-cm (5-in.) wide by 8-cm (3-in.) deep. Incorporated into the subelement were features that might be used in the fabrication of a full-scale divergent flap. These features include the use of: (1) gamma TiAl shear clips to join together sections of corrugations, (2) multiple gamma TiAl face sheets, (3) double hot-formed gamma TiAl corrugations, and (4) brazed joints. The structural integrity of the gamma TiAl sheet subelement was evaluated by conducting a room-temperature three-point static bend test.
NASA Astrophysics Data System (ADS)
Moriarty, Patrick; Sanz Rodrigo, Javier; Gancarski, Pawel; Chuchfield, Matthew; Naughton, Jonathan W.; Hansen, Kurt S.; Machefaux, Ewan; Maguire, Eoghan; Castellani, Francesco; Terzi, Ludovico; Breton, Simon-Philippe; Ueda, Yuko
2014-06-01
Researchers within the International Energy Agency (IEA) Task 31: Wakebench have created a framework for the evaluation of wind farm flow models operating at the microscale level. The framework consists of a model evaluation protocol integrated with a web-based portal for model benchmarking (www.windbench.net). This paper provides an overview of the building-block validation approach applied to wind farm wake models, including best practices for the benchmarking and data processing procedures for validation datasets from wind farm SCADA and meteorological databases. A hierarchy of test cases has been proposed for wake model evaluation, from similarity theory of the axisymmetric wake and idealized infinite wind farm, to single-wake wind tunnel (UMN-EPFL) and field experiments (Sexbierum), to wind farm arrays in offshore (Horns Rev, Lillgrund) and complex terrain conditions (San Gregorio). A summary of results from the axisymmetric wake, Sexbierum, Horns Rev and Lillgrund benchmarks are used to discuss the state-of-the-art of wake model validation and highlight the most relevant issues for future development.
Sayers, Adrian; Crowther, Michael J; Judge, Andrew; Whitehouse, Michael R; Blom, Ashley W
2017-08-28
The use of benchmarks to assess the performance of implants such as those used in arthroplasty surgery is a widespread practice. It provides surgeons, patients and regulatory authorities with the reassurance that implants used are safe and effective. However, it is not currently clear how or how many implants should be statistically compared with a benchmark to assess whether or not that implant is superior, equivalent, non-inferior or inferior to the performance benchmark of interest.We aim to describe the methods and sample size required to conduct a one-sample non-inferiority study of a medical device for the purposes of benchmarking. Simulation study. Simulation study of a national register of medical devices. We simulated data, with and without a non-informative competing risk, to represent an arthroplasty population and describe three methods of analysis (z-test, 1-Kaplan-Meier and competing risks) commonly used in surgical research. We evaluate the performance of each method using power, bias, root-mean-square error, coverage and CI width. 1-Kaplan-Meier provides an unbiased estimate of implant net failure, which can be used to assess if a surgical device is non-inferior to an external benchmark. Small non-inferiority margins require significantly more individuals to be at risk compared with current benchmarking standards. A non-inferiority testing paradigm provides a useful framework for determining if an implant meets the required performance defined by an external benchmark. Current contemporary benchmarking standards have limited power to detect non-inferiority, and substantially larger samples sizes, in excess of 3200 procedures, are required to achieve a power greater than 60%. It is clear when benchmarking implant performance, net failure estimated using 1-KM is preferential to crude failure estimated by competing risk models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NACA0012 benchmark model experimental flutter results with unsteady pressure distributions
NASA Technical Reports Server (NTRS)
Rivera, Jose A., Jr.; Dansberry, Bryan E.; Bennett, Robert M.; Durham, Michael H.; Silva, Walter A.
1992-01-01
The Structural Dynamics Division at NASA Langley Research Center has started a wind tunnel activity referred to as the Benchmark Models Program. The primary objective of this program is to acquire measured dynamic instability and corresponding pressure data that will be useful for developing and evaluating aeroelastic type computational fluid dynamics codes currently in use or under development. The program is a multi-year activity that will involve testing of several different models to investigate various aeroelastic phenomena. This paper describes results obtained from a second wind tunnel test of the first model in the Benchmark Models Program. This first model consisted of a rigid semispan wing having a rectangular planform and a NACA 0012 airfoil shape which was mounted on a flexible two degree of freedom mount system. Experimental flutter boundaries and corresponding unsteady pressure distribution data acquired over two model chords located at the 60 and 95 percent span stations are presented.
Use of integral experiments in support to the validation of JEFF-3.2 nuclear data evaluation
NASA Astrophysics Data System (ADS)
Leclaire, Nicolas; Cochet, Bertrand; Jinaphanh, Alexis; Haeck, Wim
2017-09-01
For many years now, IRSN has developed its own Monte Carlo continuous energy capability, which allows testing various nuclear data libraries. In that prospect, a validation database of 1136 experiments was built from cases used for the validation of the APOLLO2-MORET 5 multigroup route of the CRISTAL V2.0 package. In this paper, the keff obtained for more than 200 benchmarks using the JEFF-3.1.1 and JEFF-3.2 libraries are compared to benchmark keff values and main discrepancies are analyzed regarding the neutron spectrum. Special attention is paid on benchmarks for which the results have been highly modified between both JEFF-3 versions.
The Isprs Benchmark on Indoor Modelling
NASA Astrophysics Data System (ADS)
Khoshelham, K.; Díaz Vilariño, L.; Peter, M.; Kang, Z.; Acharya, D.
2017-09-01
Automated generation of 3D indoor models from point cloud data has been a topic of intensive research in recent years. While results on various datasets have been reported in literature, a comparison of the performance of different methods has not been possible due to the lack of benchmark datasets and a common evaluation framework. The ISPRS benchmark on indoor modelling aims to address this issue by providing a public benchmark dataset and an evaluation framework for performance comparison of indoor modelling methods. In this paper, we present the benchmark dataset comprising several point clouds of indoor environments captured by different sensors. We also discuss the evaluation and comparison of indoor modelling methods based on manually created reference models and appropriate quality evaluation criteria. The benchmark dataset is available for download at: http://www2.isprs.org/commissions/comm4/wg5/benchmark-on-indoor-modelling.html.
Testing for sustainable preservatives
USDA-ARS?s Scientific Manuscript database
Rising antimicrobial resistance and heath concerns of common antimicrobials warrants the development of new, safer antimicrobial agents. A rapid screening protocol was developed to assess the antimicrobial properties of natural and synthetic substances. Benchmark substances were evaluated against re...
McCance, Tanya; Wilson, Val; Kornman, Kelly
2016-07-01
The aim of the Paediatric International Nursing Study was to explore the utility of key performance indicators in developing person-centred practice across a range of services provided to sick children. The objective addressed in this paper was evaluating the use of these indicators to benchmark services internationally. This study builds on primary research, which produced indicators that were considered novel both in terms of their positive orientation and use in generating data that privileges the patient voice. This study extends this research through wider testing on an international platform within paediatrics. The overall methodological approach was a realistic evaluation used to evaluate the implementation of the key performance indicators, which combined an integrated development and evaluation methodology. The study involved children's wards/hospitals in Australia (six sites across three states) and Europe (seven sites across four countries). Qualitative and quantitative methods were used during the implementation process, however, this paper reports the quantitative data only, which used survey, observations and documentary review. The findings demonstrate the quality of care being delivered to children and their families across different international sites. The benchmarking does, however, highlight some differences between paediatric and general hospitals, and between the different key performance indicators across all the sites. The findings support the use of the key performance indicators as a novel method to benchmark services internationally. Whilst the data collected across 20 paediatric sites suggest services are more similar than different, benchmarking illuminates variations that encourage a critical dialogue about what works and why. The transferability of the key performance indicators and measurement framework across different settings has significant implications for practice. The findings offer an approach to benchmarking and celebrating the successes within practice, while learning from partners across the globe in further developing person-centred cultures. © 2016 John Wiley & Sons Ltd.
Benchmarking specialty hospitals, a scoping review on theory and practice.
Wind, A; van Harten, W H
2017-04-04
Although benchmarking may improve hospital processes, research on this subject is limited. The aim of this study was to provide an overview of publications on benchmarking in specialty hospitals and a description of study characteristics. We searched PubMed and EMBASE for articles published in English in the last 10 years. Eligible articles described a project stating benchmarking as its objective and involving a specialty hospital or specific patient category; or those dealing with the methodology or evaluation of benchmarking. Of 1,817 articles identified in total, 24 were included in the study. Articles were categorized into: pathway benchmarking, institutional benchmarking, articles on benchmark methodology or -evaluation and benchmarking using a patient registry. There was a large degree of variability:(1) study designs were mostly descriptive and retrospective; (2) not all studies generated and showed data in sufficient detail; and (3) there was variety in whether a benchmarking model was just described or if quality improvement as a consequence of the benchmark was reported upon. Most of the studies that described a benchmark model described the use of benchmarking partners from the same industry category, sometimes from all over the world. Benchmarking seems to be more developed in eye hospitals, emergency departments and oncology specialty hospitals. Some studies showed promising improvement effects. However, the majority of the articles lacked a structured design, and did not report on benchmark outcomes. In order to evaluate the effectiveness of benchmarking to improve quality in specialty hospitals, robust and structured designs are needed including a follow up to check whether the benchmark study has led to improvements.
Benchmark problems for numerical implementations of phase field models
Jokisaari, A. M.; Voorhees, P. W.; Guyer, J. E.; ...
2016-10-01
Here, we present the first set of benchmark problems for phase field models that are being developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST). While many scientific research areas use a limited set of well-established software, the growing phase field community continues to develop a wide variety of codes and lacks benchmark problems to consistently evaluate the numerical performance of new implementations. Phase field modeling has become significantly more popular as computational power has increased and is now becoming mainstream, driving the need for benchmark problems to validate and verifymore » new implementations. We follow the example set by the micromagnetics community to develop an evolving set of benchmark problems that test the usability, computational resources, numerical capabilities and physical scope of phase field simulation codes. In this paper, we propose two benchmark problems that cover the physics of solute diffusion and growth and coarsening of a second phase via a simple spinodal decomposition model and a more complex Ostwald ripening model. We demonstrate the utility of benchmark problems by comparing the results of simulations performed with two different adaptive time stepping techniques, and we discuss the needs of future benchmark problems. The development of benchmark problems will enable the results of quantitative phase field models to be confidently incorporated into integrated computational materials science and engineering (ICME), an important goal of the Materials Genome Initiative.« less
GEN-IV Benchmarking of Triso Fuel Performance Models under accident conditions modeling input data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collin, Blaise Paul
This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: • The modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release. • The modeling of the AGR-1 and HFR-EU1bis safety testing experiments. •more » The comparison of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from “Case 5” of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. “Case 5” of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to “effects of the numerical calculation method rather than the physical model” [IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary. 09/2016: Tables 6 and 8 updated. AGR-2 input data added« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collin, Blaise P.
2014-09-01
This document presents the benchmark plan for the calculation of particle fuel performance on safety testing experiments that are representative of operational accidental transients. The benchmark is dedicated to the modeling of fission product release under accident conditions by fuel performance codes from around the world, and the subsequent comparison to post-irradiation experiment (PIE) data from the modeled heating tests. The accident condition benchmark is divided into three parts: the modeling of a simplified benchmark problem to assess potential numerical calculation issues at low fission product release; the modeling of the AGR-1 and HFR-EU1bis safety testing experiments; and, the comparisonmore » of the AGR-1 and HFR-EU1bis modeling results with PIE data. The simplified benchmark case, thereafter named NCC (Numerical Calculation Case), is derived from ''Case 5'' of the International Atomic Energy Agency (IAEA) Coordinated Research Program (CRP) on coated particle fuel technology [IAEA 2012]. It is included so participants can evaluate their codes at low fission product release. ''Case 5'' of the IAEA CRP-6 showed large code-to-code discrepancies in the release of fission products, which were attributed to ''effects of the numerical calculation method rather than the physical model''[IAEA 2012]. The NCC is therefore intended to check if these numerical effects subsist. The first two steps imply the involvement of the benchmark participants with a modeling effort following the guidelines and recommendations provided by this document. The third step involves the collection of the modeling results by Idaho National Laboratory (INL) and the comparison of these results with the available PIE data. The objective of this document is to provide all necessary input data to model the benchmark cases, and to give some methodology guidelines and recommendations in order to make all results suitable for comparison with each other. The participants should read this document thoroughly to make sure all the data needed for their calculations is provided in the document. Missing data will be added to a revision of the document if necessary.« less
District Heating Systems Performance Analyses. Heat Energy Tariff
NASA Astrophysics Data System (ADS)
Ziemele, Jelena; Vigants, Girts; Vitolins, Valdis; Blumberga, Dagnija; Veidenbergs, Ivars
2014-12-01
The paper addresses an important element of the European energy sector: the evaluation of district heating (DH) system operations from the standpoint of increasing energy efficiency and increasing the use of renewable energy resources. This has been done by developing a new methodology for the evaluation of the heat tariff. The paper presents an algorithm of this methodology, which includes not only a data base and calculation equation systems, but also an integrated multi-criteria analysis module using MADM/MCDM (Multi-Attribute Decision Making / Multi-Criteria Decision Making) based on TOPSIS (Technique for Order Performance by Similarity to Ideal Solution). The results of the multi-criteria analysis are used to set the tariff benchmarks. The evaluation methodology has been tested for Latvian heat tariffs, and the obtained results show that only half of heating companies reach a benchmark value equal to 0.5 for the efficiency closeness to the ideal solution indicator. This means that the proposed evaluation methodology would not only allow companies to determine how they perform with regard to the proposed benchmark, but also to identify their need to restructure so that they may reach the level of a low-carbon business.
Benchmarks for Evaluation of Distributed Denial of Service (DDOS)
2008-01-01
publications: [1] E. Arikan , Attack Profiling for DDoS Benchmarks, MS Thesis, University of Delaware, August 2006. [2] J. Mirkovic, A. Hussain, B. Wilson...Sigmetrics 2007, June 2007 [5] J. Mirkovic, E. Arikan , S. Wei, S. Fahmy, R. Thomas, and P. Reiher Benchmarks for DDoS Defense Evaluation, Proceedings of the...Security Experimentation, June 2006. [9] J. Mirkovic, E. Arikan , S. Wei, S. Fahmy, R. Thomas, P. Reiher, Benchmarks for DDoS Defense Evaluation
Benchmarking In-Flight Icing Detection Products for Future Upgrades
NASA Technical Reports Server (NTRS)
Politovich, M. K.; Minnis, P.; Johnson, D. B.; Wolff, C. A.; Chapman, M.; Heck, P. W.; Haggerty, J. A.
2004-01-01
This paper summarizes the results of a benchmarking exercise conducted as part of the NASA supported Advanced Satellite Aviation-Weather Products (ASAP) Program. The goal of ASAP is to increase and optimize the use of satellite data sets within the existing FAA Aviation Weather Research Program (AWRP) Product Development Team (PDT) structure and to transfer advanced satellite expertise to the PDTs. Currently, ASAP fosters collaborative efforts between NASA Laboratories, the University of Wisconsin Cooperative Institute for Meteorological Satellite Studies (UW-CIMSS), the University of Alabama in Huntsville (UAH), and the AWRP PDTs. This collaboration involves the testing and evaluation of existing satellite algorithms developed or proposed by AWRP teams, the introduction of new techniques and data sets to the PDTs from the satellite community, and enhanced access to new satellite data sets available through CIMSS and NASA Langley Research Center for evaluation and testing.
The benchmark aeroelastic models program: Description and highlights of initial results
NASA Technical Reports Server (NTRS)
Bennett, Robert M.; Eckstrom, Clinton V.; Rivera, Jose A., Jr.; Dansberry, Bryan E.; Farmer, Moses G.; Durham, Michael H.
1991-01-01
An experimental effort was implemented in aeroelasticity called the Benchmark Models Program. The primary purpose of this program is to provide the necessary data to evaluate computational fluid dynamic codes for aeroelastic analysis. It also focuses on increasing the understanding of the physics of unsteady flows and providing data for empirical design. An overview is given of this program and some results obtained in the initial tests are highlighted. The tests that were completed include measurement of unsteady pressures during flutter of rigid wing with a NACA 0012 airfoil section and dynamic response measurements of a flexible rectangular wing with a thick circular arc airfoil undergoing shock boundary layer oscillations.
Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu
2015-07-27
Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.
Selecting a Relational Database Management System for Library Automation Systems.
ERIC Educational Resources Information Center
Shekhel, Alex; O'Brien, Mike
1989-01-01
Describes the evaluation of four relational database management systems (RDBMSs) (Informix Turbo, Oracle 6.0 TPS, Unify 2000 and Relational Technology's Ingres 5.0) to determine which is best suited for library automation. The evaluation criteria used to develop a benchmark specifically designed to test RDBMSs for libraries are discussed. (CLB)
Towards unbiased benchmarking of evolutionary and hybrid algorithms for real-valued optimisation
NASA Astrophysics Data System (ADS)
MacNish, Cara
2007-12-01
Randomised population-based algorithms, such as evolutionary, genetic and swarm-based algorithms, and their hybrids with traditional search techniques, have proven successful and robust on many difficult real-valued optimisation problems. This success, along with the readily applicable nature of these techniques, has led to an explosion in the number of algorithms and variants proposed. In order for the field to advance it is necessary to carry out effective comparative evaluations of these algorithms, and thereby better identify and understand those properties that lead to better performance. This paper discusses the difficulties of providing benchmarking of evolutionary and allied algorithms that is both meaningful and logistically viable. To be meaningful the benchmarking test must give a fair comparison that is free, as far as possible, from biases that favour one style of algorithm over another. To be logistically viable it must overcome the need for pairwise comparison between all the proposed algorithms. To address the first problem, we begin by attempting to identify the biases that are inherent in commonly used benchmarking functions. We then describe a suite of test problems, generated recursively as self-similar or fractal landscapes, designed to overcome these biases. For the second, we describe a server that uses web services to allow researchers to 'plug in' their algorithms, running on their local machines, to a central benchmarking repository.
Test One to Test Many: A Unified Approach to Quantum Benchmarks
NASA Astrophysics Data System (ADS)
Bai, Ge; Chiribella, Giulio
2018-04-01
Quantum benchmarks are routinely used to validate the experimental demonstration of quantum information protocols. Many relevant protocols, however, involve an infinite set of input states, of which only a finite subset can be used to test the quality of the implementation. This is a problem, because the benchmark for the finitely many states used in the test can be higher than the original benchmark calculated for infinitely many states. This situation arises in the teleportation and storage of coherent states, for which the benchmark of 50% fidelity is commonly used in experiments, although finite sets of coherent states normally lead to higher benchmarks. Here, we show that the average fidelity over all coherent states can be indirectly probed with a single setup, requiring only two-mode squeezing, a 50-50 beam splitter, and homodyne detection. Our setup enables a rigorous experimental validation of quantum teleportation, storage, amplification, attenuation, and purification of noisy coherent states. More generally, we prove that every quantum benchmark can be tested by preparing a single entangled state and measuring a single observable.
Benchmark Evaluation of HTR-PROTEUS Pebble Bed Experimental Program
Bess, John D.; Montierth, Leland; Köberl, Oliver; ...
2014-10-09
Benchmark models were developed to evaluate 11 critical core configurations of the HTR-PROTEUS pebble bed experimental program. Various additional reactor physics measurements were performed as part of this program; currently only a total of 37 absorber rod worth measurements have been evaluated as acceptable benchmark experiments for Cores 4, 9, and 10. Dominant uncertainties in the experimental keff for all core configurations come from uncertainties in the ²³⁵U enrichment of the fuel, impurities in the moderator pebbles, and the density and impurity content of the radial reflector. Calculations of k eff with MCNP5 and ENDF/B-VII.0 neutron nuclear data aremore » greater than the benchmark values but within 1% and also within the 3σ uncertainty, except for Core 4, which is the only randomly packed pebble configuration. Repeated calculations of k eff with MCNP6.1 and ENDF/B-VII.1 are lower than the benchmark values and within 1% (~3σ) except for Cores 5 and 9, which calculate lower than the benchmark eigenvalues within 4σ. The primary difference between the two nuclear data libraries is the adjustment of the absorption cross section of graphite. Simulations of the absorber rod worth measurements are within 3σ of the benchmark experiment values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
ERIC Educational Resources Information Center
Herman, Joan L.; Baker, Eva L.
2005-01-01
Many schools are moving to develop benchmark tests to monitor their students' progress toward state standards throughout the academic year. Benchmark tests can provide the ongoing information that schools need to guide instructional programs and to address student learning problems. The authors discuss six criteria that educators can use to…
How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction
NASA Astrophysics Data System (ADS)
Pappenberger, F.; Ramos, M. H.; Cloke, H. L.; Wetterhall, F.; Alfieri, L.; Bogner, K.; Mueller, A.; Salamon, P.
2015-03-01
The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are 'toughest to beat' and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.
ERIC Educational Resources Information Center
Galloway, Melissa Ritchie
2016-01-01
The purpose of this causal comparative study was to test the theory of assessment that relates benchmark assessments to the Georgia middle grades science Criterion Referenced Competency Test (CRCT) percentages, controlling for schools who do not administer benchmark assessments versus schools who do administer benchmark assessments for all middle…
Benchmark tests of JENDL-3.2 for thermal and fast reactors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Takano, Hideki; Akie, Hiroshi; Kikuchi, Yasuyuki
1994-12-31
Benchmark calculations for a variety of thermal and fast reactors have been performed by using the newly evaluated JENDL-3 Version-2 (JENDL-3.2) file. In the thermal reactor calculations for the uranium and plutonium fueled cores of TRX and TCA, the k{sub eff} and lattice parameters were well predicted. The fast reactor calculations for ZPPR-9 and FCA assemblies showed that the k{sub eff} reactivity worths of Doppler, sodium void and control rod, and reaction rate distribution were in a very good agreement with the experiments.
NASA Technical Reports Server (NTRS)
Loyselle, Patricia; Prokopius, Kevin
2011-01-01
Proton Exchange Membrane (PEM) fuel cell technology is the leading candidate to replace the alkaline fuel cell technology, currently used on the Shuttle, for future space missions. During a 5-yr development program, a PEM fuel cell powerplant was developed. This report details the initial performance evaluation test results of the powerplant.
Benchmarking and Hardware-In-The-Loop Operation of a 2014 MAZDA SkyActiv (SAE 2016-01-1007)
Engine Performance evaluation in support of LD MTE. EPA used elements of its ALPHA model to apply hardware-in-the-loop (HIL) controls to the SKYACTIV engine test setup to better understand how the engine would operate in a chassis test after combined with future leading edge tech...
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
NASA Astrophysics Data System (ADS)
Garzoglio, Gabriele
2012-12-01
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on “bare metal” nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.
Investigation of storage options for scientific computing on Grid and Cloud facilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storagemore » server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on bare metal nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.« less
A Novel Performance Evaluation Methodology for Single-Target Trackers.
Kristan, Matej; Matas, Jiri; Leonardis, Ales; Vojir, Tomas; Pflugfelder, Roman; Fernandez, Gustavo; Nebehay, Georg; Porikli, Fatih; Cehovin, Luka
2016-11-01
This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed.
Time and frequency structure of causal correlation networks in the China bond market
NASA Astrophysics Data System (ADS)
Wang, Zhongxing; Yan, Yan; Chen, Xiaosong
2017-07-01
There are more than eight hundred interest rates published in the China bond market every day. Identifying the benchmark interest rates that have broad influences on most other interest rates is a major concern for economists. In this paper, a multi-variable Granger causality test is developed and applied to construct a directed network of interest rates, whose important nodes, regarded as key interest rates, are evaluated with CheiRank scores. The results indicate that repo rates are the benchmark of short-term rates, the central bank bill rates are in the core position of mid-term interest rates network, and treasury bond rates lead the long-term bond rates. The evolution of benchmark interest rates from 2008 to 2014 is also studied, and it is found that SHIBOR has generally become the benchmark interest rate in China. In the frequency domain we identify the properties of information flows between interest rates, and the result confirms the existence of market segmentation in the China bond market.
NASA Technical Reports Server (NTRS)
Loyselle, Patricia; Prokopius, Kevin
2011-01-01
Proton exchange membrane (PEM) fuel cell technology is the leading candidate to replace the aging alkaline fuel cell technology, currently used on the Shuttle, for future space missions. This test effort marks the final phase of a 5-yr development program that began under the Second Generation Reusable Launch Vehicle (RLV) Program, transitioned into the Next Generation Launch Technologies (NGLT) Program, and continued under Constellation Systems in the Exploration Technology Development Program. Initially, the engineering model (EM) powerplant was evaluated with respect to its performance as compared to acceptance tests carried out at the manufacturer. This was to determine the sensitivity of the powerplant performance to changes in test environment. In addition, a series of tests were performed with the powerplant in the original standard orientation. This report details the continuing EM benchmark test results in three spatial orientations as well as extended duration testing in the mission profile test. The results from these tests verify the applicability of PEM fuel cells for future NASA missions. The specifics of these different tests are described in the following sections.
Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.
2013-01-01
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231
Validation of tsunami inundation model TUNA-RP using OAR-PMEL-135 benchmark problem set
NASA Astrophysics Data System (ADS)
Koh, H. L.; Teh, S. Y.; Tan, W. K.; Kh'ng, X. Y.
2017-05-01
A standard set of benchmark problems, known as OAR-PMEL-135, is developed by the US National Tsunami Hazard Mitigation Program for tsunami inundation model validation. Any tsunami inundation model must be tested for its accuracy and capability using this standard set of benchmark problems before it can be gainfully used for inundation simulation. The authors have previously developed an in-house tsunami inundation model known as TUNA-RP. This inundation model solves the two-dimensional nonlinear shallow water equations coupled with a wet-dry moving boundary algorithm. This paper presents the validation of TUNA-RP against the solutions provided in the OAR-PMEL-135 benchmark problem set. This benchmark validation testing shows that TUNA-RP can indeed perform inundation simulation with accuracy consistent with that in the tested benchmark problem set.
Development of risk-based nanomaterial groups for occupational exposure control
NASA Astrophysics Data System (ADS)
Kuempel, E. D.; Castranova, V.; Geraci, C. L.; Schulte, P. A.
2012-09-01
Given the almost limitless variety of nanomaterials, it will be virtually impossible to assess the possible occupational health hazard of each nanomaterial individually. The development of science-based hazard and risk categories for nanomaterials is needed for decision-making about exposure control practices in the workplace. A possible strategy would be to select representative (benchmark) materials from various mode of action (MOA) classes, evaluate the hazard and develop risk estimates, and then apply a systematic comparison of new nanomaterials with the benchmark materials in the same MOA class. Poorly soluble particles are used here as an example to illustrate quantitative risk assessment methods for possible benchmark particles and occupational exposure control groups, given mode of action and relative toxicity. Linking such benchmark particles to specific exposure control bands would facilitate the translation of health hazard and quantitative risk information to the development of effective exposure control practices in the workplace. A key challenge is obtaining sufficient dose-response data, based on standard testing, to systematically evaluate the nanomaterials' physical-chemical factors influencing their biological activity. Categorization processes involve both science-based analyses and default assumptions in the absence of substance-specific information. Utilizing data and information from related materials may facilitate initial determinations of exposure control systems for nanomaterials.
EVALUATION OF LITERATURE ESTABLISHING SCREENING LEVELS FOR TERRESTRIAL PLANTS/INVERTEBRATES
Scientific publications often lack key information on experimental design or do not follow appropriate test methods and therefore cannot be used in deriving reliable benchmarks. Risk based soil screening levels (Eco-SSLs) are being established for chemicals of concern to terrestr...
NASA Astrophysics Data System (ADS)
Koscheev, Vladimir; Manturov, Gennady; Pronyaev, Vladimir; Rozhikhin, Evgeny; Semenov, Mikhail; Tsibulya, Anatoly
2017-09-01
Several k∞ experiments were performed on the KBR critical facility at the Institute of Physics and Power Engineering (IPPE), Obninsk, Russia during the 1970s and 80s for study of neutron absorption properties of Cr, Mn, Fe, Ni, Zr, and Mo. Calculations of these benchmarks with almost any modern evaluated nuclear data libraries demonstrate bad agreement with the experiment. Neutron capture cross sections of the odd isotopes of Cr, Mn, Fe, and Ni in the ROSFOND-2010 library have been reevaluated and another evaluation of the Zr nuclear data has been adopted. Use of the modified nuclear data for Cr, Mn, Fe, Ni, and Zr leads to significant improvement of the C/E ratio for the KBR assemblies. Also a significant improvement in agreement between calculated and evaluated values for benchmarks with Fe reflectors was observed. C/E results obtained with the modified ROSFOND library for complex benchmark models that are highly sensitive to the cross sections of structural materials are no worse than results obtained with other major evaluated data libraries. Possible improvement in results by decreasing the capture cross section for Zr and Mo at the energies above 1 keV is indicated.
Collected notes from the Benchmarks and Metrics Workshop
NASA Technical Reports Server (NTRS)
Drummond, Mark E.; Kaelbling, Leslie P.; Rosenschein, Stanley J.
1991-01-01
In recent years there has been a proliferation of proposals in the artificial intelligence (AI) literature for integrated agent architectures. Each architecture offers an approach to the general problem of constructing an integrated agent. Unfortunately, the ways in which one architecture might be considered better than another are not always clear. There has been a growing realization that many of the positive and negative aspects of an architecture become apparent only when experimental evaluation is performed and that to progress as a discipline, we must develop rigorous experimental methods. In addition to the intrinsic intellectual interest of experimentation, rigorous performance evaluation of systems is also a crucial practical concern to our research sponsors. DARPA, NASA, and AFOSR (among others) are actively searching for better ways of experimentally evaluating alternative approaches to building intelligent agents. One tool for experimental evaluation involves testing systems on benchmark tasks in order to assess their relative performance. As part of a joint DARPA and NASA funded project, NASA-Ames and Teleos Research are carrying out a research effort to establish a set of benchmark tasks and evaluation metrics by which the performance of agent architectures may be determined. As part of this project, we held a workshop on Benchmarks and Metrics at the NASA Ames Research Center on June 25, 1990. The objective of the workshop was to foster early discussion on this important topic. We did not achieve a consensus, nor did we expect to. Collected here is some of the information that was exchanged at the workshop. Given here is an outline of the workshop, a list of the participants, notes taken on the white-board during open discussions, position papers/notes from some participants, and copies of slides used in the presentations.
NASA Technical Reports Server (NTRS)
Davis, G. J.
1994-01-01
One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
HPC Analytics Support. Requirements for Uncertainty Quantification Benchmarks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paulson, Patrick R.; Purohit, Sumit; Rodriguez, Luke R.
2015-05-01
This report outlines techniques for extending benchmark generation products so they support uncertainty quantification by benchmarked systems. We describe how uncertainty quantification requirements can be presented to candidate analytical tools supporting SPARQL. We describe benchmark data sets for evaluating uncertainty quantification, as well as an approach for using our benchmark generator to produce data sets for generating benchmark data sets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arnis Judzis
2003-01-01
This document details the progress to date on the ''OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE -- A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING'' contract for the quarter starting October 2002 through December 2002. Even though we are awaiting the optimization portion of the testing program, accomplishments included the following: (1) Smith International participated in the DOE Mud Hammer program through full scale benchmarking testing during the week of 4 November 2003. (2) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to add to themore » benchmarking testing program. (3) Following the benchmark testing of the Smith International hammer, representatives from DOE/NETL, TerraTek, Smith International and PDVSA met at TerraTek in Salt Lake City to review observations, performance and views on the optimization step for 2003. (4) The December 2002 issue of Journal of Petroleum Technology (Society of Petroleum Engineers) highlighted the DOE fluid hammer testing program and reviewed last years paper on the benchmark performance of the SDS Digger and Novatek hammers. (5) TerraTek's Sid Green presented a technical review for DOE/NETL personnel in Morgantown on ''Impact Rock Breakage'' and its importance on improving fluid hammer performance. Much discussion has taken place on the issues surrounding mud hammer performance at depth conditions.« less
Hackethal, A; Immenroth, M; Bürger, T
2006-04-01
The Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) simulator is validated for laparoscopy training, but benchmarks and target scores for assessing single tasks are needed. Control data for the MIST-VR traversal task scenario were collected from 61 novices who performed the task 10 times over 3 days (1 h daily). Data were collected on the time taken, error score, economy of movement, and total score. Test differences were analyzed through percentage scores and t-tests for paired samples. Improvement was greatest over tests 1 to 5 (improvement: test(1.2), 38.07%; p = 0.000; test(4.5), 10.66%; p = 0.010): between tests 5 and 10, improvement slowed and scores stabilized. Variation in participants' performance fell steadily over the 10 tests. Trainees should perform at least 10 tests of the traversal task-five to get used to the equipment and task (automation phase; target total score, 95.16) and five to stabilize and consolidate performance (test 10 target total score, 74.11).
Pasler, Marlies; Kaas, Jochem; Perik, Thijs; Geuze, Job; Dreindl, Ralf; Künzler, Thomas; Wittkamper, Frits; Georg, Dietmar
2015-12-01
To systematically evaluate machine specific quality assurance (QA) for volumetric modulated arc therapy (VMAT) based on log files by applying a dynamic benchmark plan. A VMAT benchmark plan was created and tested on 18 Elekta linacs (13 MLCi or MLCi2, 5 Agility) at 4 different institutions. Linac log files were analyzed and a delivery robustness index was introduced. For dosimetric measurements an ionization chamber array was used. Relative dose deviations were assessed by mean gamma for each control point and compared to the log file evaluation. Fourteen linacs delivered the VMAT benchmark plan, while 4 linacs failed by consistently terminating the delivery. The mean leaf error (±1SD) was 0.3±0.2 mm for all linacs. Large MLC maximum errors up to 6.5 mm were observed at reversal positions. Delivery robustness index accounting for MLC position correction (0.8-1.0) correlated with delivery time (80-128 s) and depended on dose rate performance. Dosimetric evaluation indicated in general accurate plan reproducibility with γ(mean)(±1 SD)=0.4±0.2 for 1 mm/1%. However single control point analysis revealed larger deviations and attributed well to log file analysis. The designed benchmark plan helped identify linac related malfunctions in dynamic mode for VMAT. Log files serve as an important additional QA measure to understand and visualize dynamic linac parameters. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Evaluation of the Lateral Performance of Roof Truss-to-Wall Connections in Light-Frame Wood Systems
Andrew DeRenzis; Vladimir Kochkin; Xiping Wang
2012-01-01
This testing program was designed to benchmark the performance of traditional roof systems and incrementally improved roof-to-wall systems with the goal of developing connection solutions that are optimized for performance and constructability. Nine full-size roof systems were constructed and tested with various levels and types of heel detailing to measure the lateral...
Validation of tungsten cross sections in the neutron energy region up to 100 keV
NASA Astrophysics Data System (ADS)
Pigni, Marco T.; Žerovnik, Gašper; Leal, Luiz. C.; Trkov, Andrej
2017-09-01
Following a series of recent cross section evaluations on tungsten isotopes performed at Oak Ridge National Laboratory (ORNL), this paper presents the validation work carried out to test the performance of the evaluated cross sections based on lead-slowing-down (LSD) benchmarks conducted in Grenoble. ORNL completed the resonance parameter evaluation of four tungsten isotopes - 182,183,184,186W - in August 2014 and submitted it as an ENDF-compatible file to be part of the next release of the ENDF/B-VIII.0 nuclear data library. The evaluations were performed with support from the US Nuclear Criticality Safety Program in an effort to provide improved tungsten cross section and covariance data for criticality safety sensitivity analyses. The validation analysis based on the LSD benchmarks showed an improved agreement with the experimental response when the ORNL tungsten evaluations were included in the ENDF/B-VII.1 library. Comparison with the results obtained with the JEFF-3.2 nuclear data library are also discussed.
Direct data access protocols benchmarking on DPM
NASA Astrophysics Data System (ADS)
Furano, Fabrizio; Devresse, Adrien; Keeble, Oliver; Mancinelli, Valentina
2015-12-01
The Disk Pool Manager is an example of a multi-protocol, multi-VO system for data access on the Grid that went though a considerable technical evolution in the last years. Among other features, its architecture offers the opportunity of testing its different data access frontends under exactly the same conditions, including hardware and backend software. This characteristic inspired the idea of collecting monitoring information from various testbeds in order to benchmark the behaviour of the HTTP and Xrootd protocols for the use case of data analysis, batch or interactive. A source of information is the set of continuous tests that are run towards the worldwide endpoints belonging to the DPM Collaboration, which accumulated relevant statistics in its first year of activity. On top of that, the DPM releases are based on multiple levels of automated testing that include performance benchmarks of various kinds, executed regularly every day. At the same time, the recent releases of DPM can report monitoring information about any data access protocol to the same monitoring infrastructure that is used to monitor the Xrootd deployments. Our goal is to evaluate under which circumstances the HTTP-based protocols can be good enough for batch or interactive data access. In this contribution we show and discuss the results that our test systems have collected under the circumstances that include ROOT analyses using TTreeCache and stress tests on the metadata performance.
The NAS kernel benchmark program
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barton, J. T.
1985-01-01
A collection of benchmark test kernels that measure supercomputer performance has been developed for the use of the NAS (Numerical Aerodynamic Simulation) program at the NASA Ames Research Center. This benchmark program is described in detail and the specific ground rules are given for running the program as a performance test.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;
2006-01-01
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.
Benchmarks: The Development of a New Approach to Student Evaluation.
ERIC Educational Resources Information Center
Larter, Sylvia
The Toronto Board of Education Benchmarks are libraries of reference materials that demonstrate student achievement at various levels. Each library contains video benchmarks, print benchmarks, a staff handbook, and summary and introductory documents. This book is about the development and the history of the benchmark program. It has taken over 3…
GROWTH OF THE INTERNATIONAL CRITICALITY SAFETY AND REACTOR PHYSICS EXPERIMENT EVALUATION PROJECTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. Blair Briggs; John D. Bess; Jim Gulliford
2011-09-01
Since the International Conference on Nuclear Criticality Safety (ICNC) 2007, the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) have continued to expand their efforts and broaden their scope. Eighteen countries participated on the ICSBEP in 2007. Now, there are 20, with recent contributions from Sweden and Argentina. The IRPhEP has also expanded from eight contributing countries in 2007 to 16 in 2011. Since ICNC 2007, the contents of the 'International Handbook of Evaluated Criticality Safety Benchmark Experiments1' have increased from 442 evaluations (38000 pages), containing benchmark specifications for 3955 critical ormore » subcritical configurations to 516 evaluations (nearly 55000 pages), containing benchmark specifications for 4405 critical or subcritical configurations in the 2010 Edition of the ICSBEP Handbook. The contents of the Handbook have also increased from 21 to 24 criticality-alarm-placement/shielding configurations with multiple dose points for each, and from 20 to 200 configurations categorized as fundamental physics measurements relevant to criticality safety applications. Approximately 25 new evaluations and 150 additional configurations are expected to be added to the 2011 edition of the Handbook. Since ICNC 2007, the contents of the 'International Handbook of Evaluated Reactor Physics Benchmark Experiments2' have increased from 16 different experimental series that were performed at 12 different reactor facilities to 53 experimental series that were performed at 30 different reactor facilities in the 2011 edition of the Handbook. Considerable effort has also been made to improve the functionality of the searchable database, DICE (Database for the International Criticality Benchmark Evaluation Project) and verify the accuracy of the data contained therein. DICE will be discussed in separate papers at ICNC 2011. The status of the ICSBEP and the IRPhEP will be discussed in the full paper, selected benchmarks that have been added to the ICSBEP Handbook will be highlighted, and a preview of the new benchmarks that will appear in the September 2011 edition of the Handbook will be provided. Accomplishments of the IRPhEP will also be highlighted and the future of both projects will be discussed. REFERENCES (1) International Handbook of Evaluated Criticality Safety Benchmark Experiments, NEA/NSC/DOC(95)03/I-IX, Organisation for Economic Co-operation and Development-Nuclear Energy Agency (OECD-NEA), September 2010 Edition, ISBN 978-92-64-99140-8. (2) International Handbook of Evaluated Reactor Physics Benchmark Experiments, NEA/NSC/DOC(2006)1, Organisation for Economic Co-operation and Development-Nuclear Energy Agency (OECD-NEA), March 2011 Edition, ISBN 978-92-64-99141-5.« less
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms
NASA Astrophysics Data System (ADS)
Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François
2015-10-01
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
NASA Astrophysics Data System (ADS)
Dimitriadis, Panayiotis; Tegos, Aristoteles; Oikonomou, Athanasios; Pagana, Vassiliki; Koukouvinos, Antonios; Mamassis, Nikos; Koutsoyiannis, Demetris; Efstratiadis, Andreas
2016-03-01
One-dimensional and quasi-two-dimensional hydraulic freeware models (HEC-RAS, LISFLOOD-FP and FLO-2d) are widely used for flood inundation mapping. These models are tested on a benchmark test with a mixed rectangular-triangular channel cross section. Using a Monte-Carlo approach, we employ extended sensitivity analysis by simultaneously varying the input discharge, longitudinal and lateral gradients and roughness coefficients, as well as the grid cell size. Based on statistical analysis of three output variables of interest, i.e. water depths at the inflow and outflow locations and total flood volume, we investigate the uncertainty enclosed in different model configurations and flow conditions, without the influence of errors and other assumptions on topography, channel geometry and boundary conditions. Moreover, we estimate the uncertainty associated to each input variable and we compare it to the overall one. The outcomes of the benchmark analysis are further highlighted by applying the three models to real-world flood propagation problems, in the context of two challenging case studies in Greece.
FDNS CFD Code Benchmark for RBCC Ejector Mode Operation
NASA Technical Reports Server (NTRS)
Holt, James B.; Ruf, Joe
1999-01-01
Computational Fluid Dynamics (CFD) analysis results are compared with benchmark quality test data from the Propulsion Engineering Research Center's (PERC) Rocket Based Combined Cycle (RBCC) experiments to verify fluid dynamic code and application procedures. RBCC engine flowpath development will rely on CFD applications to capture the multi-dimensional fluid dynamic interactions and to quantify their effect on the RBCC system performance. Therefore, the accuracy of these CFD codes must be determined through detailed comparisons with test data. The PERC experiments build upon the well-known 1968 rocket-ejector experiments of Odegaard and Stroup by employing advanced optical and laser based diagnostics to evaluate mixing and secondary combustion. The Finite Difference Navier Stokes (FDNS) code was used to model the fluid dynamics of the PERC RBCC ejector mode configuration. Analyses were performed for both Diffusion and Afterburning (DAB) and Simultaneous Mixing and Combustion (SMC) test conditions. Results from both the 2D and the 3D models are presented.
Physical properties of the benchmark models program supercritical wing
NASA Technical Reports Server (NTRS)
Dansberry, Bryan E.; Durham, Michael H.; Bennett, Robert M.; Turnock, David L.; Silva, Walter A.; Rivera, Jose A., Jr.
1993-01-01
The goal of the Benchmark Models Program is to provide data useful in the development and evaluation of aeroelastic computational fluid dynamics (CFD) codes. To that end, a series of three similar wing models are being flutter tested in the Langley Transonic Dynamics Tunnel. These models are designed to simultaneously acquire model response data and unsteady surface pressure data during wing flutter conditions. The supercritical wing is the second model of this series. It is a rigid semispan model with a rectangular planform and a NASA SC(2)-0414 supercritical airfoil shape. The supercritical wing model was flutter tested on a flexible mount, called the Pitch and Plunge Apparatus, that provides a well-defined, two-degree-of-freedom dynamic system. The supercritical wing model and associated flutter test apparatus is described and experimentally determined wind-off structural dynamic characteristics of the combined rigid model and flexible mount system are included.
Benchmarking and validation activities within JEFF project
NASA Astrophysics Data System (ADS)
Cabellos, O.; Alvarez-Velarde, F.; Angelone, M.; Diez, C. J.; Dyrda, J.; Fiorito, L.; Fischer, U.; Fleming, M.; Haeck, W.; Hill, I.; Ichou, R.; Kim, D. H.; Klix, A.; Kodeli, I.; Leconte, P.; Michel-Sendis, F.; Nunnenmann, E.; Pecchia, M.; Peneliau, Y.; Plompen, A.; Rochman, D.; Romojaro, P.; Stankovskiy, A.; Sublet, J. Ch.; Tamagno, P.; Marck, S. van der
2017-09-01
The challenge for any nuclear data evaluation project is to periodically release a revised, fully consistent and complete library, with all needed data and covariances, and ensure that it is robust and reliable for a variety of applications. Within an evaluation effort, benchmarking activities play an important role in validating proposed libraries. The Joint Evaluated Fission and Fusion (JEFF) Project aims to provide such a nuclear data library, and thus, requires a coherent and efficient benchmarking process. The aim of this paper is to present the activities carried out by the new JEFF Benchmarking and Validation Working Group, and to describe the role of the NEA Data Bank in this context. The paper will also review the status of preliminary benchmarking for the next JEFF-3.3 candidate cross-section files.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Thomas Martin; Celik, Cihangir; McMahan, Kimberly L.
This benchmark experiment was conducted as a joint venture between the US Department of Energy (DOE) and the French Commissariat à l'Energie Atomique (CEA). Staff at the Oak Ridge National Laboratory (ORNL) in the US and the Centre de Valduc in France planned this experiment. The experiment was conducted on October 11, 2010 in the SILENE critical assembly facility at Valduc. Several other organizations contributed to this experiment and the subsequent evaluation, including CEA Saclay, Lawrence Livermore National Laboratory (LLNL), the Y-12 National Security Complex (NSC), Babcock International Group in the United Kingdom, and Los Alamos National Laboratory (LANL). Themore » goal of this experiment was to measure neutron activation and thermoluminescent dosimeter (TLD) doses from a source similar to a fissile solution critical excursion. The resulting benchmark can be used for validation of computer codes and nuclear data libraries as required when performing analysis of criticality accident alarm systems (CAASs). A secondary goal of this experiment was to qualitatively test performance of two CAAS detectors similar to those currently and formerly in use in some US DOE facilities. The detectors tested were the CIDAS MkX and the Rocky Flats NCD-91. These detectors were being evaluated to determine whether they would alarm, so they were not expected to generate benchmark quality data.« less
MoMaS reactive transport benchmark using PFLOTRAN
NASA Astrophysics Data System (ADS)
Park, H.
2017-12-01
MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
NASA Technical Reports Server (NTRS)
Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)
1993-01-01
A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ogden, Jeffry B.
2005-09-26
Cbench is intended to be a relatively straightforward collection of tests, benchmarks, applications, utilities, and framework with the goal of facilitating scalable testing and benchmarking of a Linus cluster.
Hospital benchmarking: are U.S. eye hospitals ready?
de Korne, Dirk F; van Wijngaarden, Jeroen D H; Sol, Kees J C A; Betz, Robert; Thomas, Richard C; Schein, Oliver D; Klazinga, Niek S
2012-01-01
Benchmarking is increasingly considered a useful management instrument to improve quality in health care, but little is known about its applicability in hospital settings. The aims of this study were to assess the applicability of a benchmarking project in U.S. eye hospitals and compare the results with an international initiative. We evaluated multiple cases by applying an evaluation frame abstracted from the literature to five U.S. eye hospitals that used a set of 10 indicators for efficiency benchmarking. Qualitative analysis entailed 46 semistructured face-to-face interviews with stakeholders, document analyses, and questionnaires. The case studies only partially met the conditions of the evaluation frame. Although learning and quality improvement were stated as overall purposes, the benchmarking initiative was at first focused on efficiency only. No ophthalmic outcomes were included, and clinicians were skeptical about their reporting relevance and disclosure. However, in contrast with earlier findings in international eye hospitals, all U.S. hospitals worked with internal indicators that were integrated in their performance management systems and supported benchmarking. Benchmarking can support performance management in individual hospitals. Having a certain number of comparable institutes provide similar services in a noncompetitive milieu seems to lay fertile ground for benchmarking. International benchmarking is useful only when these conditions are not met nationally. Although the literature focuses on static conditions for effective benchmarking, our case studies show that it is a highly iterative and learning process. The journey of benchmarking seems to be more important than the destination. Improving patient value (health outcomes per unit of cost) requires, however, an integrative perspective where clinicians and administrators closely cooperate on both quality and efficiency issues. If these worlds do not share such a relationship, the added "public" value of benchmarking in health care is questionable.
Talaminos-Barroso, Alejandro; Estudillo-Valderrama, Miguel A; Roa, Laura M; Reina-Tosina, Javier; Ortega-Ruiz, Francisco
2016-06-01
M2M (Machine-to-Machine) communications represent one of the main pillars of the new paradigm of the Internet of Things (IoT), and is making possible new opportunities for the eHealth business. Nevertheless, the large number of M2M protocols currently available hinders the election of a suitable solution that satisfies the requirements that can demand eHealth applications. In the first place, to develop a tool that provides a benchmarking analysis in order to objectively select among the most relevant M2M protocols for eHealth solutions. In the second place, to validate the tool with a particular use case: the respiratory rehabilitation. A software tool, called Distributed Computing Framework (DFC), has been designed and developed to execute the benchmarking tests and facilitate the deployment in environments with a large number of machines, with independence of the protocol and performance metrics selected. DDS, MQTT, CoAP, JMS, AMQP and XMPP protocols were evaluated considering different specific performance metrics, including CPU usage, memory usage, bandwidth consumption, latency and jitter. The results obtained allowed to validate a case of use: respiratory rehabilitation of chronic obstructive pulmonary disease (COPD) patients in two scenarios with different types of requirement: Home-Based and Ambulatory. The results of the benchmark comparison can guide eHealth developers in the choice of M2M technologies. In this regard, the framework presented is a simple and powerful tool for the deployment of benchmark tests under specific environments and conditions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Benchmarking in pathology: development of an activity-based costing model.
Burnett, Leslie; Wilson, Roger; Pfeffer, Sally; Lowry, John
2012-12-01
Benchmarking in Pathology (BiP) allows pathology laboratories to determine the unit cost of all laboratory tests and procedures, and also provides organisational productivity indices allowing comparisons of performance with other BiP participants. We describe 14 years of progressive enhancement to a BiP program, including the implementation of 'avoidable costs' as the accounting basis for allocation of costs rather than previous approaches using 'total costs'. A hierarchical tree-structured activity-based costing model distributes 'avoidable costs' attributable to the pathology activities component of a pathology laboratory operation. The hierarchical tree model permits costs to be allocated across multiple laboratory sites and organisational structures. This has enabled benchmarking on a number of levels, including test profiles and non-testing related workload activities. The development of methods for dealing with variable cost inputs, allocation of indirect costs using imputation techniques, panels of tests, and blood-bank record keeping, have been successfully integrated into the costing model. A variety of laboratory management reports are produced, including the 'cost per test' of each pathology 'test' output. Benchmarking comparisons may be undertaken at any and all of the 'cost per test' and 'cost per Benchmarking Complexity Unit' level, 'discipline/department' (sub-specialty) level, or overall laboratory/site and organisational levels. We have completed development of a national BiP program. An activity-based costing methodology based on avoidable costs overcomes many problems of previous benchmarking studies based on total costs. The use of benchmarking complexity adjustment permits correction for varying test-mix and diagnostic complexity between laboratories. Use of iterative communication strategies with program participants can overcome many obstacles and lead to innovations.
Berthon, Beatrice; Spezi, Emiliano; Galavis, Paulina; Shepherd, Tony; Apte, Aditya; Hatt, Mathieu; Fayad, Hadi; De Bernardi, Elisabetta; Soffientini, Chiara D; Ross Schmidtlein, C; El Naqa, Issam; Jeraj, Robert; Lu, Wei; Das, Shiva; Zaidi, Habib; Mawlawi, Osama R; Visvikis, Dimitris; Lee, John A; Kirov, Assen S
2017-08-01
The aim of this paper is to define the requirements and describe the design and implementation of a standard benchmark tool for evaluation and validation of PET-auto-segmentation (PET-AS) algorithms. This work follows the recommendations of Task Group 211 (TG211) appointed by the American Association of Physicists in Medicine (AAPM). The recommendations published in the AAPM TG211 report were used to derive a set of required features and to guide the design and structure of a benchmarking software tool. These items included the selection of appropriate representative data and reference contours obtained from established approaches and the description of available metrics. The benchmark was designed in a way that it could be extendable by inclusion of bespoke segmentation methods, while maintaining its main purpose of being a standard testing platform for newly developed PET-AS methods. An example of implementation of the proposed framework, named PETASset, was built. In this work, a selection of PET-AS methods representing common approaches to PET image segmentation was evaluated within PETASset for the purpose of testing and demonstrating the capabilities of the software as a benchmark platform. A selection of clinical, physical, and simulated phantom data, including "best estimates" reference contours from macroscopic specimens, simulation template, and CT scans was built into the PETASset application database. Specific metrics such as Dice Similarity Coefficient (DSC), Positive Predictive Value (PPV), and Sensitivity (S), were included to allow the user to compare the results of any given PET-AS algorithm to the reference contours. In addition, a tool to generate structured reports on the evaluation of the performance of PET-AS algorithms against the reference contours was built. The variation of the metric agreement values with the reference contours across the PET-AS methods evaluated for demonstration were between 0.51 and 0.83, 0.44 and 0.86, and 0.61 and 1.00 for DSC, PPV, and the S metric, respectively. Examples of agreement limits were provided to show how the software could be used to evaluate a new algorithm against the existing state-of-the art. PETASset provides a platform that allows standardizing the evaluation and comparison of different PET-AS methods on a wide range of PET datasets. The developed platform will be available to users willing to evaluate their PET-AS methods and contribute with more evaluation datasets. © 2017 The Authors. Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ye; Ma, Xiaosong; Liu, Qing Gary
2015-01-01
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less
Ilie, Marius; Khambata-Ford, Shirin; Copie-Bergman, Christiane; Huang, Lingkang; Juco, Jonathan; Hofman, Veronique; Hofman, Paul
2017-01-01
For non-small cell lung cancer (NSCLC), treatment with pembrolizumab is limited to patients with tumours expressing PD-L1 assessed by immunohistochemistry (IHC) using the PD-L1 IHC 22C3 pharmDx (Dako, Inc.) companion diagnostic test, on the Dako Autostainer Link 48 (ASL48) platform. Optimised protocols are urgently needed for use of the 22C3 antibody concentrate to test PD-L1 expression on more widely available IHC autostainers. We evaluated PD-L1 expression using the 22C3 antibody concentrate in the three main commercially available autostainers Dako ASL48, BenchMark ULTRA (Ventana Medical Systems, Inc.), and Bond-III (Leica Biosystems) and compared the staining results with the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Several technical conditions for laboratory-developed tests (LDTs) were evaluated in tonsil specimens and a training set of three NSCLC samples. Optimised protocols were then validated in 120 NSCLC specimens. Optimised protocols were obtained on both the VENTANA BenchMark ULTRA and Dako ASL48 platforms. Significant expression of PD-L1 was obtained on tissue controls with the Leica Bond-III autostainer when high concentrations of the 22C3 antibody were used. It therefore was not tested on the 120 NSCLC specimens. An almost 100% concordance rate for dichotomized tumour proportion score (TPS) results was observed between TPS ratings using the 22C3 antibody concentrate on the Dako ASL48 and VENTANA BenchMark ULTRA platforms relative to the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Interpathologist agreement was high on both LDTs and the PD-L1 IHC 22C3 pharmDx kit on the Dako ASL48 platform. Availability of standardized protocols for determining PD-L1 expression using the 22C3 antibody concentrate on the widely available Dako ASL48 and VENTANA BenchMark ULTRA IHC platforms will expand the number of laboratories able to determine eligibility of patients with NSCLC for treatment with pembrolizumab in a reliable and concordant manner.
Benchmark dose risk assessment software (BMDS) was designed by EPA to generate dose-response curves and facilitate the analysis, interpretation and synthesis of toxicological data. Partial results of QA/QC testing of the EPA benchmark dose software (BMDS) are presented. BMDS pr...
The National Oceanic and Atmospheric Administration recently sponsored the New England Forecasting Pilot Program to serve as a "test bed" for chemical forecasting by providing all of the elements of a National Air Quality Forecasting System, including the development and implemen...
Towards Systematic Benchmarking of Climate Model Performance
NASA Astrophysics Data System (ADS)
Gleckler, P. J.
2014-12-01
The process by which climate models are evaluated has evolved substantially over the past decade, with the Coupled Model Intercomparison Project (CMIP) serving as a centralizing activity for coordinating model experimentation and enabling research. Scientists with a broad spectrum of expertise have contributed to the CMIP model evaluation process, resulting in many hundreds of publications that have served as a key resource for the IPCC process. For several reasons, efforts are now underway to further systematize some aspects of the model evaluation process. First, some model evaluation can now be considered routine and should not require "re-inventing the wheel" or a journal publication simply to update results with newer models. Second, the benefit of CMIP research to model development has not been optimal because the publication of results generally takes several years and is usually not reproducible for benchmarking newer model versions. And third, there are now hundreds of model versions and many thousands of simulations, but there is no community-based mechanism for routinely monitoring model performance changes. An important change in the design of CMIP6 can help address these limitations. CMIP6 will include a small set standardized experiments as an ongoing exercise (CMIP "DECK": ongoing Diagnostic, Evaluation and Characterization of Klima), so that modeling groups can submit them at any time and not be overly constrained by deadlines. In this presentation, efforts to establish routine benchmarking of existing and future CMIP simulations will be described. To date, some benchmarking tools have been made available to all CMIP modeling groups to enable them to readily compare with CMIP5 simulations during the model development process. A natural extension of this effort is to make results from all CMIP simulations widely available, including the results from newer models as soon as the simulations become available for research. Making the results from routine performance tests readily accessible will help advance a more transparent model evaluation process.
Molecular diffusion of stable water isotopes in polar firn as a proxy for past temperatures
NASA Astrophysics Data System (ADS)
Holme, Christian; Gkinis, Vasileios; Vinther, Bo M.
2018-03-01
Polar precipitation archived in ice caps contains information on past temperature conditions. Such information can be retrieved by measuring the water isotopic signals of δ18O and δD in ice cores. These signals have been attenuated during densification due to molecular diffusion in the firn column, where the magnitude of the diffusion is isotopologue specific and temperature dependent. By utilizing the differential diffusion signal, dual isotope measurements of δ18O and δD enable multiple temperature reconstruction techniques. This study assesses how well six different methods can be used to reconstruct past surface temperatures from the diffusion-based temperature proxies. Two of the methods are based on the single diffusion lengths of δ18O and δD , three of the methods employ the differential diffusion signal, while the last uses the ratio between the single diffusion lengths. All techniques are tested on synthetic data in order to evaluate their accuracy and precision. We perform a benchmark test to thirteen high resolution Holocene data sets from Greenland and Antarctica, which represent a broad range of mean annual surface temperatures and accumulation rates. Based on the benchmark test, we comment on the accuracy and precision of the methods. Both the benchmark test and the synthetic data test demonstrate that the most precise reconstructions are obtained when using the single isotope diffusion lengths, with precisions of approximately 1.0 °C . In the benchmark test, the single isotope diffusion lengths are also found to reconstruct consistent temperatures with a root-mean-square-deviation of 0.7 °C . The techniques employing the differential diffusion signals are more uncertain, where the most precise method has a precision of 1.9 °C . The diffusion length ratio method is the least precise with a precision of 13.7 °C . The absolute temperature estimates from this method are also shown to be highly sensitive to the choice of fractionation factor parameterization.
Hermans, Michel P; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank; Vandenberghe, Hans; Brotons, Carlos
2013-11-01
To assess prospectively the effect of benchmarking on quality of primary care for patients with type 2 diabetes by using three major modifiable cardiovascular risk factors as critical quality indicators. Primary care physicians treating patients with type 2 diabetes in six European countries were randomized to give standard care (control group) or standard care with feedback benchmarked against other centers in each country (benchmarking group). In both groups, laboratory tests were performed every 4 months. The primary end point was the percentage of patients achieving preset targets of the critical quality indicators HbA1c, LDL cholesterol, and systolic blood pressure (SBP) after 12 months of follow-up. Of 4,027 patients enrolled, 3,996 patients were evaluable and 3,487 completed 12 months of follow-up. Primary end point of HbA1c target was achieved in the benchmarking group by 58.9 vs. 62.1% in the control group (P = 0.398) after 12 months; 40.0 vs. 30.1% patients met the SBP target (P < 0.001); 54.3 vs. 49.7% met the LDL cholesterol target (P = 0.006). Percentages of patients meeting all three targets increased during the study in both groups, with a statistically significant increase observed in the benchmarking group. The percentage of patients achieving all three targets at month 12 was significantly larger in the benchmarking group than in the control group (12.5 vs. 8.1%; P < 0.001). In this prospective, randomized, controlled study, benchmarking was shown to be an effective tool for increasing achievement of critical quality indicators and potentially reducing patient cardiovascular residual risk profile.
Benchmarking Is Associated With Improved Quality of Care in Type 2 Diabetes
Hermans, Michel P.; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank; Vandenberghe, Hans; Brotons, Carlos
2013-01-01
OBJECTIVE To assess prospectively the effect of benchmarking on quality of primary care for patients with type 2 diabetes by using three major modifiable cardiovascular risk factors as critical quality indicators. RESEARCH DESIGN AND METHODS Primary care physicians treating patients with type 2 diabetes in six European countries were randomized to give standard care (control group) or standard care with feedback benchmarked against other centers in each country (benchmarking group). In both groups, laboratory tests were performed every 4 months. The primary end point was the percentage of patients achieving preset targets of the critical quality indicators HbA1c, LDL cholesterol, and systolic blood pressure (SBP) after 12 months of follow-up. RESULTS Of 4,027 patients enrolled, 3,996 patients were evaluable and 3,487 completed 12 months of follow-up. Primary end point of HbA1c target was achieved in the benchmarking group by 58.9 vs. 62.1% in the control group (P = 0.398) after 12 months; 40.0 vs. 30.1% patients met the SBP target (P < 0.001); 54.3 vs. 49.7% met the LDL cholesterol target (P = 0.006). Percentages of patients meeting all three targets increased during the study in both groups, with a statistically significant increase observed in the benchmarking group. The percentage of patients achieving all three targets at month 12 was significantly larger in the benchmarking group than in the control group (12.5 vs. 8.1%; P < 0.001). CONCLUSIONS In this prospective, randomized, controlled study, benchmarking was shown to be an effective tool for increasing achievement of critical quality indicators and potentially reducing patient cardiovascular residual risk profile. PMID:23846810
NASA Technical Reports Server (NTRS)
Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.
1991-01-01
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Algorithm and Architecture Independent Benchmarking with SEAK
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tallent, Nathan R.; Manzano Franco, Joseph B.; Gawande, Nitin A.
2016-05-23
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed the Suite for Embedded Applications & Kernels (SEAK), a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user black-box evaluation that can capture tradeoffs between performance, power, accuracy, size, andmore » weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
Fast Neutron Spectrum Potassium Worth for Space Power Reactor Design Validation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Marshall, Margaret A.; Briggs, J. Blair
2015-03-01
A variety of critical experiments were constructed of enriched uranium metal (oralloy ) during the 1960s and 1970s at the Oak Ridge Critical Experiments Facility (ORCEF) in support of criticality safety operations at the Y-12 Plant. The purposes of these experiments included the evaluation of storage, casting, and handling limits for the Y-12 Plant and providing data for verification of calculation methods and cross-sections for nuclear criticality safety applications. These included solid cylinders of various diameters, annuli of various inner and outer diameters, two and three interacting cylinders of various diameters, and graphite and polyethylene reflected cylinders and annuli. Ofmore » the hundreds of delayed critical experiments, one was performed that consisted of uranium metal annuli surrounding a potassium-filled, stainless steel can. The outer diameter of the annuli was approximately 13 inches (33.02 cm) with an inner diameter of 7 inches (17.78 cm). The diameter of the stainless steel can was 7 inches (17.78 cm). The critical height of the configurations was approximately 5.6 inches (14.224 cm). The uranium annulus consisted of multiple stacked rings, each with radial thicknesses of 1 inch (2.54 cm) and varying heights. A companion measurement was performed using empty stainless steel cans; the primary purpose of these experiments was to test the fast neutron cross sections of potassium as it was a candidate for coolant in some early space power reactor designs.The experimental measurements were performed on July 11, 1963, by J. T. Mihalczo and M. S. Wyatt (Ref. 1) with additional information in its corresponding logbook. Unreflected and unmoderated experiments with the same set of highly enriched uranium metal parts were performed at the Oak Ridge Critical Experiments Facility in the 1960s and are evaluated in the International Handbook for Evaluated Criticality Safety Benchmark Experiments (ICSBEP Handbook) with the identifier HEU MET FAST 051. Thin graphite reflected (2 inches or less) experiments also using the same set of highly enriched uranium metal parts are evaluated in HEU MET FAST 071. Polyethylene-reflected configurations are evaluated in HEU-MET-FAST-076. A stack of highly enriched metal discs with a thick beryllium top reflector is evaluated in HEU-MET-FAST-069, and two additional highly enriched uranium annuli with beryllium cores are evaluated in HEU-MET-FAST-059. Both detailed and simplified model specifications are provided in this evaluation. Both of these fast neutron spectra assemblies were determined to be acceptable benchmark experiments. The calculated eigenvalues for both the detailed and the simple benchmark models are within ~0.26 % of the benchmark values for Configuration 1 (calculations performed using MCNP6 with ENDF/B-VII.1 neutron cross section data), but under-calculate the benchmark values by ~7s because the uncertainty in the benchmark is very small: ~0.0004 (1s); for Configuration 2, the under-calculation is ~0.31 % and ~8s. Comparison of detailed and simple model calculations for the potassium worth measurement and potassium mass coefficient yield results approximately 70 – 80 % lower (~6s to 10s) than the benchmark values for the various nuclear data libraries utilized. Both the potassium worth and mass coefficient are also deemed to be acceptable benchmark experiment measurements.« less
ERIC Educational Resources Information Center
Coughlin, David C.; Bielen, Rhonda P.
This paper has been prepared to assist the United States Department of Labor to explore new approaches to evaluating and measuring the performance of employment and training activities for youth. As one of several tools for evaluating success of local youth training programs, "benchmarking" provides a system for measuring the development…
EBR-II Reactor Physics Benchmark Evaluation Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pope, Chad L.; Lum, Edward S; Stewart, Ryan
This report provides a reactor physics benchmark evaluation with associated uncertainty quantification for the critical configuration of the April 1986 Experimental Breeder Reactor II Run 138B core configuration.
Using benchmarks for radiation testing of microprocessors and FPGAs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quinn, Heather; Robinson, William H.; Rech, Paolo
Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
Using benchmarks for radiation testing of microprocessors and FPGAs
Quinn, Heather; Robinson, William H.; Rech, Paolo; ...
2015-12-17
Performance benchmarks have been used over the years to compare different systems. These benchmarks can be useful for researchers trying to determine how changes to the technology, architecture, or compiler affect the system's performance. No such standard exists for systems deployed into high radiation environments, making it difficult to assess whether changes in the fabrication process, circuitry, architecture, or software affect reliability or radiation sensitivity. In this paper, we propose a benchmark suite for high-reliability systems that is designed for field-programmable gate arrays and microprocessors. As a result, we describe the development process and report neutron test data for themore » hardware and software benchmarks.« less
NASA Astrophysics Data System (ADS)
Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.
2016-10-01
An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
Implementation and verification of global optimization benchmark problems
NASA Astrophysics Data System (ADS)
Posypkin, Mikhail; Usov, Alexander
2017-12-01
The paper considers the implementation and verification of a test suite containing 150 benchmarks for global deterministic box-constrained optimization. A C++ library for describing standard mathematical expressions was developed for this purpose. The library automate the process of generating the value of a function and its' gradient at a given point and the interval estimates of a function and its' gradient on a given box using a single description. Based on this functionality, we have developed a collection of tests for an automatic verification of the proposed benchmarks. The verification has shown that literary sources contain mistakes in the benchmarks description. The library and the test suite are available for download and can be used freely.
Testing and Benchmarking a 2014 GM Silverado 6L80 Six Speed Automatic Transmission
Describe the method and test results of EPA’s partial transmission benchmarking process which involves installing both the engine and transmission in an engine dynamometer test cell with the engine wire harness tethered to its vehicle parked outside the test cell.
New features and improved uncertainty analysis in the NEA nuclear data sensitivity tool (NDaST)
NASA Astrophysics Data System (ADS)
Dyrda, J.; Soppera, N.; Hill, I.; Bossant, M.; Gulliford, J.
2017-09-01
Following the release and initial testing period of the NEA's Nuclear Data Sensitivity Tool [1], new features have been designed and implemented in order to expand its uncertainty analysis capabilities. The aim is to provide a free online tool for integral benchmark testing, that is both efficient and comprehensive, meeting the needs of the nuclear data and benchmark testing communities. New features include access to P1 sensitivities for neutron scattering angular distribution [2] and constrained Chi sensitivities for the prompt fission neutron energy sampling. Both of these are compatible with covariance data accessed via the JANIS nuclear data software, enabling propagation of the resultant uncertainties in keff to a large series of integral experiment benchmarks. These capabilities are available using a number of different covariance libraries e.g., ENDF/B, JEFF, JENDL and TENDL, allowing comparison of the broad range of results it is possible to obtain. The IRPhE database of reactor physics measurements is now also accessible within the tool in addition to the criticality benchmarks from ICSBEP. Other improvements include the ability to determine and visualise the energy dependence of a given calculated result in order to better identify specific regions of importance or high uncertainty contribution. Sorting and statistical analysis of the selected benchmark suite is now also provided. Examples of the plots generated by the software are included to illustrate such capabilities. Finally, a number of analytical expressions, for example Maxwellian and Watt fission spectra will be included. This will allow the analyst to determine the impact of varying such distributions within the data evaluation, either through adjustment of parameters within the expressions, or by comparison to a more general probability distribution fitted to measured data. The impact of such changes is verified through calculations which are compared to a `direct' measurement found by adjustment of the original ENDF format file.
Cross-Evaluation of Degree Programmes in Higher Education
ERIC Educational Resources Information Center
Kettunen, Juha
2010-01-01
Purpose: This study seeks to develop and describe the benchmarking approach of enhancement-led evaluation in higher education and to present a cross-evaluation process for degree programmes. Design/methodology/approach: The benchmarking approach produces useful information for the development of degree programmes based on self-evaluation,…
How to Advance TPC Benchmarks with Dependability Aspects
NASA Astrophysics Data System (ADS)
Almeida, Raquel; Poess, Meikel; Nambiar, Raghunath; Patil, Indira; Vieira, Marco
Transactional systems are the core of the information systems of most organizations. Although there is general acknowledgement that failures in these systems often entail significant impact both on the proceeds and reputation of companies, the benchmarks developed and managed by the Transaction Processing Performance Council (TPC) still maintain their focus on reporting bare performance. Each TPC benchmark has to pass a list of dependability-related tests (to verify ACID properties), but not all benchmarks require measuring their performances. While TPC-E measures the recovery time of some system failures, TPC-H and TPC-C only require functional correctness of such recovery. Consequently, systems used in TPC benchmarks are tuned mostly for performance. In this paper we argue that nowadays systems should be tuned for a more comprehensive suite of dependability tests, and that a dependability metric should be part of TPC benchmark publications. The paper discusses WHY and HOW this can be achieved. Two approaches are introduced and discussed: augmenting each TPC benchmark in a customized way, by extending each specification individually; and pursuing a more unified approach, defining a generic specification that could be adjoined to any TPC benchmark.
A One-group, One-dimensional Transport Benchmark in Cylindrical Geometry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barry Ganapol; Abderrafi M. Ougouag
A 1-D, 1-group computational benchmark in cylndrical geometry is described. This neutron transport benchmark is useful for evaluating reactor concepts that possess azimuthal symmetry such as a pebble-bed reactor.
NASA Astrophysics Data System (ADS)
Aldrin, John C.; Hopkins, Deborah; Datuin, Marvin; Warchol, Mark; Warchol, Lyudmila; Forsyth, David S.; Buynak, Charlie; Lindgren, Eric A.
2017-02-01
For model benchmark studies, the accuracy of the model is typically evaluated based on the change in response relative to a selected reference signal. The use of a side drilled hole (SDH) in a plate was investigated as a reference signal for angled beam shear wave inspection for aircraft structure inspections of fastener sites. Systematic studies were performed with varying SDH depth and size, and varying the ultrasonic probe frequency, focal depth, and probe height. Increased error was observed with the simulation of angled shear wave beams in the near-field. Even more significant, asymmetry in real probes and the inherent sensitivity of signals in the near-field to subtle test conditions were found to provide a greater challenge with achieving model agreement. To achieve quality model benchmark results for this problem, it is critical to carefully align the probe with the part geometry, to verify symmetry in probe response, and ideally avoid using reference signals from the near-field response. Suggested reference signals for angled beam shear wave inspections include using the `through hole' corner specular reflection signal and the full skip' signal off of the far wall from the side drilled hole.
Reverse Engineering Course at Philadelphia University in Jordan
ERIC Educational Resources Information Center
Younis, M. Bani; Tutunji, T.
2012-01-01
Reverse engineering (RE) is the process of testing and analysing a system or a device in order to identify, understand and document its functionality. RE is an efficient tool in industrial benchmarking where competitors' products are dissected and evaluated for performance and costs. RE can play an important role in the re-configuration and…
Evaluating real-time Java for mission-critical large-scale embedded systems
NASA Technical Reports Server (NTRS)
Sharp, D. C.; Pla, E.; Luecke, K. R.; Hassan, R. J.
2003-01-01
This paper describes benchmarking results on an RT JVM. This paper extends previously published results by including additional tests, by being run on a recently available pre-release version of the first commercially supported RTSJ implementation, and by assessing results based on our experience with avionics systems in other languages.
Benchmarking routine psychological services: a discussion of challenges and methods.
Delgadillo, Jaime; McMillan, Dean; Leach, Chris; Lucock, Mike; Gilbody, Simon; Wood, Nick
2014-01-01
Policy developments in recent years have led to important changes in the level of access to evidence-based psychological treatments. Several methods have been used to investigate the effectiveness of these treatments in routine care, with different approaches to outcome definition and data analysis. To present a review of challenges and methods for the evaluation of evidence-based treatments delivered in routine mental healthcare. This is followed by a case example of a benchmarking method applied in primary care. High, average and poor performance benchmarks were calculated through a meta-analysis of published data from services working under the Improving Access to Psychological Therapies (IAPT) Programme in England. Pre-post treatment effect sizes (ES) and confidence intervals were estimated to illustrate a benchmarking method enabling services to evaluate routine clinical outcomes. High, average and poor performance ES for routine IAPT services were estimated to be 0.91, 0.73 and 0.46 for depression (using PHQ-9) and 1.02, 0.78 and 0.52 for anxiety (using GAD-7). Data from one specific IAPT service exemplify how to evaluate and contextualize routine clinical performance against these benchmarks. The main contribution of this report is to summarize key recommendations for the selection of an adequate set of psychometric measures, the operational definition of outcomes, and the statistical evaluation of clinical performance. A benchmarking method is also presented, which may enable a robust evaluation of clinical performance against national benchmarks. Some limitations concerned significant heterogeneity among data sources, and wide variations in ES and data completeness.
Benchmark Simulation Model No 2: finalisation of plant layout and default control strategy.
Nopens, I; Benedetti, L; Jeppsson, U; Pons, M-N; Alex, J; Copp, J B; Gernaey, K V; Rosen, C; Steyer, J-P; Vanrolleghem, P A
2010-01-01
The COST/IWA Benchmark Simulation Model No 1 (BSM1) has been available for almost a decade. Its primary purpose has been to create a platform for control strategy benchmarking of activated sludge processes. The fact that the research work related to the benchmark simulation models has resulted in more than 300 publications worldwide demonstrates the interest in and need of such tools within the research community. Recent efforts within the IWA Task Group on "Benchmarking of control strategies for WWTPs" have focused on an extension of the benchmark simulation model. This extension aims at facilitating control strategy development and performance evaluation at a plant-wide level and, consequently, includes both pretreatment of wastewater as well as the processes describing sludge treatment. The motivation for the extension is the increasing interest and need to operate and control wastewater treatment systems not only at an individual process level but also on a plant-wide basis. To facilitate the changes, the evaluation period has been extended to one year. A prolonged evaluation period allows for long-term control strategies to be assessed and enables the use of control handles that cannot be evaluated in a realistic fashion in the one week BSM1 evaluation period. In this paper, the finalised plant layout is summarised and, as was done for BSM1, a default control strategy is proposed. A demonstration of how BSM2 can be used to evaluate control strategies is also given.
BioPreDyn-bench: a suite of benchmark problems for dynamic modelling in systems biology.
Villaverde, Alejandro F; Henriques, David; Smallbone, Kieran; Bongard, Sophia; Schmid, Joachim; Cicin-Sain, Damjan; Crombach, Anton; Saez-Rodriguez, Julio; Mauch, Klaus; Balsa-Canto, Eva; Mendes, Pedro; Jaeger, Johannes; Banga, Julio R
2015-02-20
Dynamic modelling is one of the cornerstones of systems biology. Many research efforts are currently being invested in the development and exploitation of large-scale kinetic models. The associated problems of parameter estimation (model calibration) and optimal experimental design are particularly challenging. The community has already developed many methods and software packages which aim to facilitate these tasks. However, there is a lack of suitable benchmark problems which allow a fair and systematic evaluation and comparison of these contributions. Here we present BioPreDyn-bench, a set of challenging parameter estimation problems which aspire to serve as reference test cases in this area. This set comprises six problems including medium and large-scale kinetic models of the bacterium E. coli, baker's yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The level of description includes metabolism, transcription, signal transduction, and development. For each problem we provide (i) a basic description and formulation, (ii) implementations ready-to-run in several formats, (iii) computational results obtained with specific solvers, (iv) a basic analysis and interpretation. This suite of benchmark problems can be readily used to evaluate and compare parameter estimation methods. Further, it can also be used to build test problems for sensitivity and identifiability analysis, model reduction and optimal experimental design methods. The suite, including codes and documentation, can be freely downloaded from the BioPreDyn-bench website, https://sites.google.com/site/biopredynbenchmarks/ .
Comparative performance evaluation of advanced AC and DC EV propulsion systems
NASA Astrophysics Data System (ADS)
MacDowall, R. D.; Crumley, R. L.
Idaho National Engineering Laboratory (INEL) evaluates EV propulsion systems and components for the U.S. Department of Energy (DOE) Electric and Hybrid Vehicle (EHV) Program. In this study, experimental data were used to evaluate the relative performances of the benchmark Chrysler/GE ETV-1 DC and the Ford/GE First Generation Single-Shaft AC (ETX-I) propulsion systems. Tests were conducted on the INEL's chassis dynamometer using identical aerodynamic and rolling resistance road-load coefficients and vehicle test weights. The results allowed a direct comparison of selected efficiency and performance characteristics for the two propulsion system technologies. The ETX-I AC system exhibited slightly lower system efficiency during constant speed testing than the ETV-1 DC propulsion system.
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements
Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.; ...
2014-11-04
Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
Evaluation of Neutron Radiography Reactor LEU-Core Start-Up Measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Maddock, Thomas L.; Smolinski, Andrew T.
Benchmark models were developed to evaluate the cold-critical start-up measurements performed during the fresh core reload of the Neutron Radiography (NRAD) reactor with Low Enriched Uranium (LEU) fuel. Experiments include criticality, control-rod worth measurements, shutdown margin, and excess reactivity for four core loadings with 56, 60, 62, and 64 fuel elements. The worth of four graphite reflector block assemblies and an empty dry tube used for experiment irradiations were also measured and evaluated for the 60-fuel-element core configuration. Dominant uncertainties in the experimental k eff come from uncertainties in the manganese content and impurities in the stainless steel fuel claddingmore » as well as the 236U and erbium poison content in the fuel matrix. Calculations with MCNP5 and ENDF/B-VII.0 neutron nuclear data are approximately 1.4% (9σ) greater than the benchmark model eigenvalues, which is commonly seen in Monte Carlo simulations of other TRIGA reactors. Simulations of the worth measurements are within the 2σ uncertainty for most of the benchmark experiment worth values. The complete benchmark evaluation details are available in the 2014 edition of the International Handbook of Evaluated Reactor Physics Benchmark Experiments.« less
FDNS CFD Code Benchmark for RBCC Ejector Mode Operation: Continuing Toward Dual Rocket Effects
NASA Technical Reports Server (NTRS)
West, Jeff; Ruf, Joseph H.; Turner, James E. (Technical Monitor)
2000-01-01
Computational Fluid Dynamics (CFD) analysis results are compared with benchmark quality test data from the Propulsion Engineering Research Center's (PERC) Rocket Based Combined Cycle (RBCC) experiments to verify fluid dynamic code and application procedures. RBCC engine flowpath development will rely on CFD applications to capture the multi -dimensional fluid dynamic interactions and to quantify their effect on the RBCC system performance. Therefore, the accuracy of these CFD codes must be determined through detailed comparisons with test data. The PERC experiments build upon the well-known 1968 rocket-ejector experiments of Odegaard and Stroup by employing advanced optical and laser based diagnostics to evaluate mixing and secondary combustion. The Finite Difference Navier Stokes (FDNS) code [2] was used to model the fluid dynamics of the PERC RBCC ejector mode configuration. Analyses were performed for the Diffusion and Afterburning (DAB) test conditions at the 200-psia thruster operation point, Results with and without downstream fuel injection are presented.
Moran, Patrick W.; Nowell, Lisa H.; Kemble, Nile E.; Mahler, Barbara J.; Waite, Ian R.; Van Metre, Peter C.
2017-01-01
Simultaneous assessment of sediment chemistry, sediment toxicity, and macroinvertebrate communities can provide multiple lines of evidence when investigating relations between sediment contaminants and ecological degradation. These three measures were evaluated at 99 wadable stream sites across 11 states in the Midwestern United States during the summer of 2013 to assess sediment pollution across a large agricultural landscape. This evaluation considers an extensive suite of sediment chemistry totaling 274 analytes (polycyclic aromatic hydrocarbons, organochlorine compounds, polychlorinated biphenyls, polybrominated diphenyl ethers, trace elements, and current-use pesticides) and a mixture assessment based on the ratios of detected compounds to available effects-based benchmarks. The sediments were tested for toxicity with the amphipod Hyalella azteca (28-d exposure), the midge Chironomus dilutus (10-d), and, at a few sites, with the freshwater mussel Lampsilis siliquoidea (28-d). Sediment concentrations, normalized to organic carbon content, infrequently exceeded benchmarks for aquatic health, which was generally consistent with low rates of observed toxicity. However, the benchmark-based mixture score and the pyrethroid insecticide bifenthrin were significantly related to observed sediment toxicity. The sediment mixture score and bifenthrin were also significant predictors of the upper limits of several univariate measures of the macroinvertebrate community (EPT percent, MMI (Macroinvertebrate Multimetric Index) Score, Ephemeroptera and Trichoptera richness) using quantile regression. Multivariate pattern matching (Mantel-like tests) of macroinvertebrate species per site to identified contaminant metrics and sediment toxicity also indicate that the sediment mixture score and bifenthrin have weak, albeit significant, influence on the observed invertebrate community composition. Together, these three lines of evidence (toxicity tests, univariate metrics, and multivariate community analysis) suggest that elevated contaminant concentrations in sediments, in particular bifenthrin, is limiting macroinvertebrate communities in several of these Midwest streams.
NASA Astrophysics Data System (ADS)
Lutz, Jesse J.; Duan, Xiaofeng F.; Ranasinghe, Duminda S.; Jin, Yifan; Margraf, Johannes T.; Perera, Ajith; Burggraf, Larry W.; Bartlett, Rodney J.
2018-05-01
Accurate optical characterization of the closo-Si12C12 molecule is important to guide experimental efforts toward the synthesis of nano-wires, cyclic nano-arrays, and related array structures, which are anticipated to be robust and efficient exciton materials for opto-electronic devices. Working toward calibrated methods for the description of closo-Si12C12 oligomers, various electronic structure approaches are evaluated for their ability to reproduce measured optical transitions of the SiC2, Si2Cn (n = 1-3), and Si3Cn (n = 1, 2) clusters reported earlier by Steglich and Maier [Astrophys. J. 801, 119 (2015)]. Complete-basis-limit equation-of-motion coupled-cluster (EOMCC) results are presented and a comparison is made between perturbative and renormalized non-iterative triples corrections. The effect of adding a renormalized correction for quadruples is also tested. Benchmark test sets derived from both measurement and high-level EOMCC calculations are then used to evaluate the performance of a variety of density functionals within the time-dependent density functional theory (TD-DFT) framework. The best-performing functionals are subsequently applied to predict valence TD-DFT excitation energies for the lowest-energy isomers of SinC and Sin-1C7-n (n = 4-6). TD-DFT approaches are then applied to the SinCn (n = 4-12) clusters and unique spectroscopic signatures of closo-Si12C12 are discussed. Finally, various long-range corrected density functionals, including those from the CAM-QTP family, are applied to a charge-transfer excitation in a cyclic (Si4C4)4 oligomer. Approaches for gauging the extent of charge-transfer character are also tested and EOMCC results are used to benchmark functionals and make recommendations.
Structural Benchmark Creep Testing for the Advanced Stirling Convertor Heater Head
NASA Technical Reports Server (NTRS)
Krause, David L.; Kalluri, Sreeramesh; Bowman, Randy R.; Shah, Ashwin R.
2008-01-01
The National Aeronautics and Space Administration (NASA) has identified the high efficiency Advanced Stirling Radioisotope Generator (ASRG) as a candidate power source for use on long duration Science missions such as lunar applications, Mars rovers, and deep space missions. For the inherent long life times required, a structurally significant design limit for the heater head component of the ASRG Advanced Stirling Convertor (ASC) is creep deformation induced at low stress levels and high temperatures. Demonstrating proof of adequate margins on creep deformation and rupture for the operating conditions and the MarM-247 material of construction is a challenge that the NASA Glenn Research Center is addressing. The combined analytical and experimental program ensures integrity and high reliability of the heater head for its 17-year design life. The life assessment approach starts with an extensive series of uniaxial creep tests on thin MarM-247 specimens that comprise the same chemistry, microstructure, and heat treatment processing as the heater head itself. This effort addresses a scarcity of openly available creep properties for the material as well as for the virtual absence of understanding of the effect on creep properties due to very thin walls, fine grains, low stress levels, and high-temperature fabrication steps. The approach continues with a considerable analytical effort, both deterministically to evaluate the median creep life using nonlinear finite element analysis, and probabilistically to calculate the heater head s reliability to a higher degree. Finally, the approach includes a substantial structural benchmark creep testing activity to calibrate and validate the analytical work. This last element provides high fidelity testing of prototypical heater head test articles; the testing includes the relevant material issues and the essential multiaxial stress state, and applies prototypical and accelerated temperature profiles for timely results in a highly controlled laboratory environment. This paper focuses on the last element and presents a preliminary methodology for creep rate prediction, the experimental methods, test challenges, and results from benchmark testing of a trial MarM-247 heater head test article. The results compare favorably with the analytical strain predictions. A description of other test findings is provided, and recommendations for future test procedures are suggested. The manuscript concludes with describing the potential impact of the heater head creep life assessment and benchmark testing effort on the ASC program.
libvdwxc: a library for exchange-correlation functionals in the vdW-DF family
NASA Astrophysics Data System (ADS)
Hjorth Larsen, Ask; Kuisma, Mikael; Löfgren, Joakim; Pouillon, Yann; Erhart, Paul; Hyldgaard, Per
2017-09-01
We present libvdwxc, a general library for evaluating the energy and potential for the family of vdW-DF exchange-correlation functionals. libvdwxc is written in C and provides an efficient implementation of the vdW-DF method and can be interfaced with various general-purpose DFT codes. Currently, the Gpaw and Octopus codes implement interfaces to libvdwxc. The present implementation emphasizes scalability and parallel performance, and thereby enables ab initio calculations of nanometer-scale complexes. The numerical accuracy is benchmarked on the S22 test set whereas parallel performance is benchmarked on ligand-protected gold nanoparticles ({{Au}}144{({{SC}}11{{NH}}25)}60) up to 9696 atoms.
Yang, Xiaolong; Zhang, Peidong; Li, Wentao; Hu, Chengye; Zhang, Xiumei; He, Pingguo
2018-04-23
Seagrasses are major coastal primary producers and are widely distributed on coasts worldwide. Seagrasses show sensitivity to environmental stress due to their high phenotypic plasticity, and therefore, we evaluated the use of constituent elements in four dominant seagrass species as early warning indicators for nitrogen eutrophication of coastal regions. A meta-analysis was conducted with published data to develop a global benchmark for the selected indicator, which was used to evaluate nitrogen loading at a global scale. A case study at three bays was subsequently conducted to test for local-scale differences in leaf C/N ratios in four seagrasses. Additionally, morphological and physiological metrics of seagrasses were measured from the three locations under varied nitrogen levels to develop further assessment indexes. The benchmark and local study showed that leaf C/N ratios of Zostera marina were sensitive to nitrogen discharge, which could be a highly valuable early warning indicator on a global scale. Moreover, the threshold value of seagrass leaf C/N was determined according to the benchmark to differentiate eutrophic and low nitrogen levels at a local scale. Of the eight phenotypic metrics measured, leaf width, total chlorophyll (a + b), chlorophyll ratio (a/b), and starch in the rhizome were the most effective at discriminating between the three locations and could also be promising indicators for monitoring eutrophication. Copyright © 2018. Published by Elsevier B.V.
A Comparative Study of Randomized Constraint Solvers for Random-Symbolic Testing
NASA Technical Reports Server (NTRS)
Takaki, Mitsuo; Cavalcanti, Diego; Gheyi, Rohit; Iyoda, Juliano; dAmorim, Marcelo; Prudencio, Ricardo
2009-01-01
The complexity of constraints is a major obstacle for constraint-based software verification. Automatic constraint solvers are fundamentally incomplete: input constraints often build on some undecidable theory or some theory the solver does not support. This paper proposes and evaluates several randomized solvers to address this issue. We compare the effectiveness of a symbolic solver (CVC3), a random solver, three hybrid solvers (i.e., mix of random and symbolic), and two heuristic search solvers. We evaluate the solvers on two benchmarks: one consisting of manually generated constraints and another generated with a concolic execution of 8 subjects. In addition to fully decidable constraints, the benchmarks include constraints with non-linear integer arithmetic, integer modulo and division, bitwise arithmetic, and floating-point arithmetic. As expected symbolic solving (in particular, CVC3) subsumes the other solvers for the concolic execution of subjects that only generate decidable constraints. For the remaining subjects the solvers are complementary.
NASA Indexing Benchmarks: Evaluating Text Search Engines
NASA Technical Reports Server (NTRS)
Esler, Sandra L.; Nelson, Michael L.
1997-01-01
The current proliferation of on-line information resources underscores the requirement for the ability to index collections of information and search and retrieve them in a convenient manner. This study develops criteria for analytically comparing the index and search engines and presents results for a number of freely available search engines. A product of this research is a toolkit capable of automatically indexing, searching, and extracting performance statistics from each of the focused search engines. This toolkit is highly configurable and has the ability to run these benchmark tests against other engines as well. Results demonstrate that the tested search engines can be grouped into two levels. Level one engines are efficient on small to medium sized data collections, but show weaknesses when used for collections 100MB or larger. Level two search engines are recommended for data collections up to and beyond 100MB.
SkData: data sets and algorithm evaluation protocols in Python
NASA Astrophysics Data System (ADS)
Bergstra, James; Pinto, Nicolas; Cox, David D.
2015-01-01
Machine learning benchmark data sets come in all shapes and sizes, whereas classification algorithms assume sanitized input, such as (x, y) pairs with vector-valued input x and integer class label y. Researchers and practitioners know all too well how tedious it can be to get from the URL of a new data set to a NumPy ndarray suitable for e.g. pandas or sklearn. The SkData library handles that work for a growing number of benchmark data sets (small and large) so that one-off in-house scripts for downloading and parsing data sets can be replaced with library code that is reliable, community-tested, and documented. The SkData library also introduces an open-ended formalization of training and testing protocols that facilitates direct comparison with published research. This paper describes the usage and architecture of the SkData library.
Simulated annealing with probabilistic analysis for solving traveling salesman problems
NASA Astrophysics Data System (ADS)
Hong, Pei-Yee; Lim, Yai-Fung; Ramli, Razamin; Khalid, Ruzelan
2013-09-01
Simulated Annealing (SA) is a widely used meta-heuristic that was inspired from the annealing process of recrystallization of metals. Therefore, the efficiency of SA is highly affected by the annealing schedule. As a result, in this paper, we presented an empirical work to provide a comparable annealing schedule to solve symmetric traveling salesman problems (TSP). Randomized complete block design is also used in this study. The results show that different parameters do affect the efficiency of SA and thus, we propose the best found annealing schedule based on the Post Hoc test. SA was tested on seven selected benchmarked problems of symmetric TSP with the proposed annealing schedule. The performance of SA was evaluated empirically alongside with benchmark solutions and simple analysis to validate the quality of solutions. Computational results show that the proposed annealing schedule provides a good quality of solution.
Additive Manufacturing of Thermoplastic Matrix Composites Using Ultrasonics
NASA Astrophysics Data System (ADS)
Olson, Meghan
Advanced composite materials have great potential for facilitating energy efficient product design and their manufacture if improvements are made to current composite manufacturing processes. This thesis focuses on the development of a novel manufacturing process for thermoplastic composite structures entitled Laser-Ultrasonic Additive Manufacturing ('LUAM'), which is intended to combine the benefits of laser processing technology, developed by Automated Dynamics Inc., with ultrasonic bonding technology that is used commercially for unreinforced polymers. These technologies used together have the potential to significantly reduce the energy consumption and void content of thermoplastic composites made using Automated Fiber Placement (AFP). To develop LUAM in a methodical manner with minimal risk, a staged approach was devised whereby coupon-level mechanical testing and prototyping utilizing existing equipment was accomplished. Four key tasks have been identified for this effort: Benchmarking, Ultrasonic Compaction, Laser Assisted Ultrasonic Compaction, and Demonstration and Characterization of LUAM. This thesis specifically addresses Tasks 1 and 2, i.e. Benchmarking and Ultrasonic Compaction, respectively. Task 1, fabricating test specimens using two traditional processes (autoclave and thermal press) and testing structural performance and dimensional accuracy, provide results of a benchmarking study by which the performance of all future phases will be gauged. Task 2, fabricating test specimens using a non-traditional process (ultrasonic conpaction) and evaluating in a similar fashion, explores the the role of ultrasonic processing parameters using three different thermoplastic composite materials. Further development of LUAM, although beyond the scope of this thesis, will combine laser and ultrasonic technology and eventually demonstrate a working system.
SU-E-T-148: Benchmarks and Pre-Treatment Reviews: A Study of Quality Assurance Effectiveness
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lowenstein, J; Nguyen, H; Roll, J
Purpose: To determine the impact benchmarks and pre-treatment reviews have on improving the quality of submitted clinical trial data. Methods: Benchmarks are used to evaluate a site’s ability to develop a treatment that meets a specific protocol’s treatment guidelines prior to placing their first patient on the protocol. A pre-treatment review is an actual patient placed on the protocol in which the dosimetry and contour volumes are evaluated to be per protocol guidelines prior to allowing the beginning of the treatment. A key component of these QA mechanisms is that sites are provided timely feedback to educate them on howmore » to plan per the protocol and prevent protocol deviations on patients accrued to a protocol. For both benchmarks and pre-treatment reviews a dose volume analysis (DVA) was performed using MIM softwareTM. For pre-treatment reviews a volume contour evaluation was also performed. Results: IROC Houston performed a QA effectiveness analysis of a protocol which required both benchmarks and pre-treatment reviews. In 70 percent of the patient cases submitted, the benchmark played an effective role in assuring that the pre-treatment review of the cases met protocol requirements. The 35 percent of sites failing the benchmark subsequently modified there planning technique to pass the benchmark before being allowed to submit a patient for pre-treatment review. However, in 30 percent of the submitted cases the pre-treatment review failed where the majority (71 percent) failed the DVA. 20 percent of sites submitting patients failed to correct their dose volume discrepancies indicated by the benchmark case. Conclusion: Benchmark cases and pre-treatment reviews can be an effective QA tool to educate sites on protocol guidelines and to minimize deviations. Without the benchmark cases it is possible that 65 percent of the cases undergoing a pre-treatment review would have failed to meet the protocols requirements.Support: U24-CA-180803.« less
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-05-13
..., describes the fisheries, evaluates the status of the stock, estimates biological benchmarks, projects future.... Participants will evaluate and recommend datasets appropriate for assessment analysis, employ assessment models to evaluate stock status, estimate population benchmarks and management criteria, and project future...
Han, Jeong-Hwan; Oda, Takuji
2018-04-14
The performance of exchange-correlation functionals in density-functional theory (DFT) calculations for liquid metal has not been sufficiently examined. In the present study, benchmark tests of Perdew-Burke-Ernzerhof (PBE), Armiento-Mattsson 2005 (AM05), PBE re-parameterized for solids, and local density approximation (LDA) functionals are conducted for liquid sodium. The pair correlation function, equilibrium atomic volume, bulk modulus, and relative enthalpy are evaluated at 600 K and 1000 K. Compared with the available experimental data, the errors range from -11.2% to 0.0% for the atomic volume, from -5.2% to 22.0% for the bulk modulus, and from -3.5% to 2.5% for the relative enthalpy depending on the DFT functional. The generalized gradient approximation functionals are superior to the LDA functional, and the PBE and AM05 functionals exhibit the best performance. In addition, we assess whether the error tendency in liquid simulations is comparable to that in solid simulations, which would suggest that the atomic volume and relative enthalpy performances are comparable between solid and liquid states but that the bulk modulus performance is not. These benchmark test results indicate that the results of liquid simulations are significantly dependent on the exchange-correlation functional and that the DFT functional performance in solid simulations can be used to roughly estimate the performance in liquid simulations.
NASA Astrophysics Data System (ADS)
Han, Jeong-Hwan; Oda, Takuji
2018-04-01
The performance of exchange-correlation functionals in density-functional theory (DFT) calculations for liquid metal has not been sufficiently examined. In the present study, benchmark tests of Perdew-Burke-Ernzerhof (PBE), Armiento-Mattsson 2005 (AM05), PBE re-parameterized for solids, and local density approximation (LDA) functionals are conducted for liquid sodium. The pair correlation function, equilibrium atomic volume, bulk modulus, and relative enthalpy are evaluated at 600 K and 1000 K. Compared with the available experimental data, the errors range from -11.2% to 0.0% for the atomic volume, from -5.2% to 22.0% for the bulk modulus, and from -3.5% to 2.5% for the relative enthalpy depending on the DFT functional. The generalized gradient approximation functionals are superior to the LDA functional, and the PBE and AM05 functionals exhibit the best performance. In addition, we assess whether the error tendency in liquid simulations is comparable to that in solid simulations, which would suggest that the atomic volume and relative enthalpy performances are comparable between solid and liquid states but that the bulk modulus performance is not. These benchmark test results indicate that the results of liquid simulations are significantly dependent on the exchange-correlation functional and that the DFT functional performance in solid simulations can be used to roughly estimate the performance in liquid simulations.
Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening †
Yoon, Sang Min
2018-01-01
Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches. PMID:29614767
Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening.
Cho, Heeryon; Yoon, Sang Min
2018-04-01
Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches.
Benchmark simulation model no 2: general protocol and exploratory case studies.
Jeppsson, U; Pons, M-N; Nopens, I; Alex, J; Copp, J B; Gernaey, K V; Rosen, C; Steyer, J-P; Vanrolleghem, P A
2007-01-01
Over a decade ago, the concept of objectively evaluating the performance of control strategies by simulating them using a standard model implementation was introduced for activated sludge wastewater treatment plants. The resulting Benchmark Simulation Model No 1 (BSM1) has been the basis for a significant new development that is reported on here: Rather than only evaluating control strategies at the level of the activated sludge unit (bioreactors and secondary clarifier) the new BSM2 now allows the evaluation of control strategies at the level of the whole plant, including primary clarifier and sludge treatment with anaerobic sludge digestion. In this contribution, the decisions that have been made over the past three years regarding the models used within the BSM2 are presented and argued, with particular emphasis on the ADM1 description of the digester, the interfaces between activated sludge and digester models, the included temperature dependencies and the reject water storage. BSM2-implementations are now available in a wide range of simulation platforms and a ring test has verified their proper implementation, consistent with the BSM2 definition. This guarantees that users can focus on the control strategy evaluation rather than on modelling issues. Finally, for illustration, twelve simple operational strategies have been implemented in BSM2 and their performance evaluated. Results show that it is an interesting control engineering challenge to further improve the performance of the BSM2 plant (which is the whole idea behind benchmarking) and that integrated control (i.e. acting at different places in the whole plant) is certainly worthwhile to achieve overall improvement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Nicholas R.; Carlsen, Brett W.; Dixon, Brent W.
Dynamic fuel cycle simulation tools are intended to model holistic transient nuclear fuel cycle scenarios. As with all simulation tools, fuel cycle simulators require verification through unit tests, benchmark cases, and integral tests. Model validation is a vital aspect as well. Although compara-tive studies have been performed, there is no comprehensive unit test and benchmark library for fuel cycle simulator tools. The objective of this paper is to identify the must test functionalities of a fuel cycle simulator tool within the context of specific problems of interest to the Fuel Cycle Options Campaign within the U.S. Department of Energy smore » Office of Nuclear Energy. The approach in this paper identifies the features needed to cover the range of promising fuel cycle options identified in the DOE-NE Fuel Cycle Evaluation and Screening (E&S) and categorizes these features to facilitate prioritization. Features were categorized as essential functions, integrating features, and exemplary capabilities. One objective of this paper is to propose a library of unit tests applicable to each of the essential functions. Another underlying motivation for this paper is to encourage an international dialog on the functionalities and standard test methods for fuel cycle simulator tools.« less
Three Key Issues in the Reform Programs for the Chinese College Entrance Examination
ERIC Educational Resources Information Center
Liu, Qinghua
2013-01-01
The new entrance exam reform programs that have been presented in a number of provinces and regions adhere to the direction of new curriculum reform. Within these programs, comprehensive evaluation serves as the weather vane for quality education. The high school academic proficiency test serves as a firmly fixed benchmark for learning ability,…
Benchmarking Evaluation Results for Prototype Extravehicular Activity Gloves
NASA Technical Reports Server (NTRS)
Aitchison, Lindsay; McFarland, Shane
2012-01-01
The Space Suit Assembly (SSA) Development Team at NASA Johnson Space Center has invested heavily in the advancement of rear-entry planetary exploration suit design but largely deferred development of extravehicular activity (EVA) glove designs, and accepted the risk of using the current flight gloves, Phase VI, for unique mission scenarios outside the Space Shuttle and International Space Station (ISS) Program realm of experience. However, as design reference missions mature, the risks of using heritage hardware have highlighted the need for developing robust new glove technologies. To address the technology gap, the NASA Game-Changing Technology group provided start-up funding for the High Performance EVA Glove (HPEG) Project in the spring of 2012. The overarching goal of the HPEG Project is to develop a robust glove design that increases human performance during EVA and creates pathway for future implementation of emergent technologies, with specific aims of increasing pressurized mobility to 60% of barehanded capability, increasing the durability by 100%, and decreasing the potential of gloves to cause injury during use. The HPEG Project focused initial efforts on identifying potential new technologies and benchmarking the performance of current state of the art gloves to identify trends in design and fit leading to establish standards and metrics against which emerging technologies can be assessed at both the component and assembly levels. The first of the benchmarking tests evaluated the quantitative mobility performance and subjective fit of four prototype gloves developed by Flagsuit LLC, Final Frontier Designs, LLC Dover, and David Clark Company as compared to the Phase VI. All of the companies were asked to design and fabricate gloves to the same set of NASA provided hand measurements (which corresponded to a single size of Phase Vi glove) and focus their efforts on improving mobility in the metacarpal phalangeal and carpometacarpal joints. Four test subjects representing the design ]to hand anthropometry completed range of motion, grip/pinch strength, dexterity, and fit evaluations for each glove design in both the unpressurized and pressurized conditions. This paper provides a comparison of the test results along with a detailed description of hardware and test methodologies used.
Docking and scoring with ICM: the benchmarking results and strategies for improvement
Neves, Marco A. C.; Totrov, Maxim; Abagyan, Ruben
2012-01-01
Flexible docking and scoring using the Internal Coordinate Mechanics software (ICM) was benchmarked for ligand binding mode prediction against the 85 co-crystal structures in the modified Astex data set. The ICM virtual ligand screening was tested against the 40 DUD target benchmarks and 11-target WOMBAT sets. The self-docking accuracy was evaluated for the top 1 and top 3 scoring poses at each ligand binding site with near native conformations below 2 Å RMSD found in 91% and 95% of the predictions, respectively. The virtual ligand screening using single rigid pocket conformations provided the median area under the ROC curves equal to 69.4 with 22.0% true positives recovered at 2% false positive rate. Significant improvements up to ROC AUC= 82.2 and ROC(2%)= 45.2 were achieved following our best practices for flexible pocket refinement and out-of-pocket binding rescore. The virtual screening can be further improved by considering multiple conformations of the target. PMID:22569591
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-01-01
Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-11-01
With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
NASA Astrophysics Data System (ADS)
Leonardi, Marcelo
The primary purpose of this study was to examine the impact of a scheduling change from a trimester 4x4 block schedule to a modified hybrid schedule on student achievement in ninth grade biology courses. This study examined the impact of the scheduling change on student achievement through teacher created benchmark assessments in Genetics, DNA, and Evolution and on the California Standardized Test in Biology. The secondary purpose of this study examined the ninth grade biology teacher perceptions of ninth grade biology student achievement. Using a mixed methods research approach, data was collected both quantitatively and qualitatively as aligned to research questions. Quantitative methods included gathering data from departmental benchmark exams and California Standardized Test in Biology and conducting multiple analysis of covariance and analysis of covariance to determine significance differences. Qualitative methods include journal entries questions and focus group interviews. The results revealed a statistically significant increase in scores on both the DNA and Evolution benchmark exams. DNA and Evolution benchmark exams showed significant improvements from a change in scheduling format. The scheduling change was responsible for 1.5% of the increase in DNA benchmark scores and 2% of the increase in Evolution benchmark scores. The results revealed a statistically significant decrease in scores on the Genetics Benchmark exam as a result of the scheduling change. The scheduling change was responsible for 1% of the decrease in Genetics benchmark scores. The results also revealed a statistically significant increase in scores on the CST Biology exam. The scheduling change was responsible for .7% of the increase in CST Biology scores. Results of the focus group discussions indicated that all teachers preferred the modified hybrid schedule over the trimester schedule and that it improved student achievement.
NASA Astrophysics Data System (ADS)
Hanssen, R. F.
2017-12-01
In traditional geodesy, one is interested in determining the coordinates, or the change in coordinates, of predefined benchmarks. These benchmarks are clearly identifiable and are especially established to be representative of the signal of interest. This holds, e.g., for leveling benchmarks, for triangulation/trilateration benchmarks, and for GNSS benchmarks. The desired coordinates are not identical to the basic measurements, and need to be estimated using robust estimation procedures, where the stochastic nature of the measurements is taken into account. For InSAR, however, the `benchmarks' are not predefined. In fact, usually we do not know where an effective benchmark is located, even though we can determine its dynamic behavior pretty well. This poses several significant problems. First, we cannot describe the quality of the measurements, unless we already know the dynamic behavior of the benchmark. Second, if we don't know the quality of the measurements, we cannot compute the quality of the estimated parameters. Third, rather harsh assumptions need to be made to produce a result. These (usually implicit) assumptions differ between processing operators and the used software, and are severely affected by the amount of available data. Fourth, the `relative' nature of the final estimates is usually not explicitly stated, which is particularly problematic for non-expert users. Finally, whereas conventional geodesy applies rigorous testing to check for measurement or model errors, this is hardly ever done in InSAR-geodesy. These problems make it rather impossible to provide a precise, reliable, repeatable, and `universal' InSAR product or service. Here we evaluate the requirements and challenges to move towards InSAR as a geodetically-proof product. In particular this involves the explicit inclusion of contextual information, as well as InSAR procedures, standards and a technical protocol, supported by the International Association of Geodesy and the international scientific community.
NASA Astrophysics Data System (ADS)
Cowdery, E.; Dietze, M.
2017-12-01
As atmospheric levels of carbon dioxide levels continue to increase, it is critical that terrestrial ecosystem models can accurately predict ecological responses to the changing environment. Current predictions of net primary productivity (NPP) in response to elevated atmospheric CO2 concentration are highly variable and contain a considerable amount of uncertainty. Benchmarking model predictions against data are necessary to assess their ability to replicate observed patterns, but also to identify and evaluate the assumptions causing inter-model differences. We have implemented a novel benchmarking workflow as part of the Predictive Ecosystem Analyzer (PEcAn) that is automated, repeatable, and generalized to incorporate different sites and ecological models. Building on the recent Free-Air CO2 Enrichment Model Data Synthesis (FACE-MDS) project, we used observational data from the FACE experiments to test this flexible, extensible benchmarking approach aimed at providing repeatable tests of model process representation that can be performed quickly and frequently. Model performance assessments are often limited to traditional residual error analysis; however, this can result in a loss of critical information. Models that fail tests of relative measures of fit may still perform well under measures of absolute fit and mathematical similarity. This implies that models that are discounted as poor predictors of ecological productivity may still be capturing important patterns. Conversely, models that have been found to be good predictors of productivity may be hiding error in their sub-process that result in the right answers for the wrong reasons. Our suite of tests have not only highlighted process based sources of uncertainty in model productivity calculations, they have also quantified the patterns and scale of this error. Combining these findings with PEcAn's model sensitivity analysis and variance decomposition strengthen our ability to identify which processes need further study and additional data constraints. This can be used to inform future experimental design and in turn can provide an informative starting point for data assimilation.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-12-06
... and facilitate the use of documentation in future evaluations and benchmarking. Extraordinary.... Benchmarking Other Agencies' Experiences A Federal agency cannot rely on another agency's categorical exclusion... was established. Federal agencies can also substantiate categorical exclusions by benchmarking, or...
Test suite for image-based motion estimation of the brain and tongue
NASA Astrophysics Data System (ADS)
Ramsey, Jordan; Prince, Jerry L.; Gomez, Arnold D.
2017-03-01
Noninvasive analysis of motion has important uses as qualitative markers for organ function and to validate biomechanical computer simulations relative to experimental observations. Tagged MRI is considered the gold standard for noninvasive tissue motion estimation in the heart, and this has inspired multiple studies focusing on other organs, including the brain under mild acceleration and the tongue during speech. As with other motion estimation approaches, using tagged MRI to measure 3D motion includes several preprocessing steps that affect the quality and accuracy of estimation. Benchmarks, or test suites, are datasets of known geometries and displacements that act as tools to tune tracking parameters or to compare different motion estimation approaches. Because motion estimation was originally developed to study the heart, existing test suites focus on cardiac motion. However, many fundamental differences exist between the heart and other organs, such that parameter tuning (or other optimization) with respect to a cardiac database may not be appropriate. Therefore, the objective of this research was to design and construct motion benchmarks by adopting an "image synthesis" test suite to study brain deformation due to mild rotational accelerations, and a benchmark to model motion of the tongue during speech. To obtain a realistic representation of mechanical behavior, kinematics were obtained from finite-element (FE) models. These results were combined with an approximation of the acquisition process of tagged MRI (including tag generation, slice thickness, and inconsistent motion repetition). To demonstrate an application of the presented methodology, the effect of motion inconsistency on synthetic measurements of head- brain rotation and deformation was evaluated. The results indicated that acquisition inconsistency is roughly proportional to head rotation estimation error. Furthermore, when evaluating non-rigid deformation, the results suggest that inconsistent motion can yield "ghost" shear strains, which are a function of slice acquisition viability as opposed to a true physical deformation.
Test Suite for Image-Based Motion Estimation of the Brain and Tongue
Ramsey, Jordan; Prince, Jerry L.; Gomez, Arnold D.
2017-01-01
Noninvasive analysis of motion has important uses as qualitative markers for organ function and to validate biomechanical computer simulations relative to experimental observations. Tagged MRI is considered the gold standard for noninvasive tissue motion estimation in the heart, and this has inspired multiple studies focusing on other organs, including the brain under mild acceleration and the tongue during speech. As with other motion estimation approaches, using tagged MRI to measure 3D motion includes several preprocessing steps that affect the quality and accuracy of estimation. Benchmarks, or test suites, are datasets of known geometries and displacements that act as tools to tune tracking parameters or to compare different motion estimation approaches. Because motion estimation was originally developed to study the heart, existing test suites focus on cardiac motion. However, many fundamental differences exist between the heart and other organs, such that parameter tuning (or other optimization) with respect to a cardiac database may not be appropriate. Therefore, the objective of this research was to design and construct motion benchmarks by adopting an “image synthesis” test suite to study brain deformation due to mild rotational accelerations, and a benchmark to model motion of the tongue during speech. To obtain a realistic representation of mechanical behavior, kinematics were obtained from finite-element (FE) models. These results were combined with an approximation of the acquisition process of tagged MRI (including tag generation, slice thickness, and inconsistent motion repetition). To demonstrate an application of the presented methodology, the effect of motion inconsistency on synthetic measurements of head-brain rotation and deformation was evaluated. The results indicated that acquisition inconsistency is roughly proportional to head rotation estimation error. Furthermore, when evaluating non-rigid deformation, the results suggest that inconsistent motion can yield “ghost” shear strains, which are a function of slice acquisition viability as opposed to a true physical deformation. PMID:28781414
2016-11-01
iii Contents List of Figures v 1. Introduction 1 2. Background 1 3. Yahoo ! Cloud Serving Benchmark (YCSB) 2 3.1 Data Loading and Performance...transactional system. 3. Yahoo ! Cloud Serving Benchmark (YCSB) 3.1 Data Loading and Performance Testing Framework When originally setting out to perform the...that referred to a data loading and performance testing framework, Yahoo ! Cloud Serving Benchmark (YCSB).12 This framework is freely available and
Stress Testing of Organic Light- Emitting Diode Panels and Luminaires
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, Lynn; Rountree, Kelley; Mills, Karmann
This report builds on previous DOE efforts with OLED technology by updating information on a previously benchmarked OLED product (the Chalina luminaire from Acuity Brands) and provides new benchmarks on the performance of Brite 2 and Brite Amber OLED panels from OLEDWorks. During the tests described here, samples of these devices were subjected to continuous operation in stress tests at elevated ambient temperature environments of 35°C or 45°C. In addition, samples were also operated continuously at room temperature in a room temperature operational life test (RTOL). One goal of this study was to investigate whether these test conditions can acceleratemore » failure of OLED panels, either through panel shorting or an open circuit in the panel. These stress tests are shown to provide meaningful acceleration of OLED failure modes, and an acceleration factor of 2.6 was calculated at 45°C for some test conditions. In addition, changes in the photometric properties of the emitted light (e.g., luminous flux and chromaticity maintenance) was also evaluated for insights into the long-term stability of these products compared to earlier generations. Because OLEDs are a lighting system, electrical testing was also performed on the panel-driver pairs to provide insights into the impact of the driver on long-term panel performance.« less
Verification and benchmark testing of the NUFT computer code
NASA Astrophysics Data System (ADS)
Lee, K. H.; Nitao, J. J.; Kulshrestha, A.
1993-10-01
This interim report presents results of work completed in the ongoing verification and benchmark testing of the NUFT (Nonisothermal Unsaturated-saturated Flow and Transport) computer code. NUFT is a suite of multiphase, multicomponent models for numerical solution of thermal and isothermal flow and transport in porous media, with application to subsurface contaminant transport problems. The code simulates the coupled transport of heat, fluids, and chemical components, including volatile organic compounds. Grid systems may be cartesian or cylindrical, with one-, two-, or fully three-dimensional configurations possible. In this initial phase of testing, the NUFT code was used to solve seven one-dimensional unsaturated flow and heat transfer problems. Three verification and four benchmarking problems were solved. In the verification testing, excellent agreement was observed between NUFT results and the analytical or quasianalytical solutions. In the benchmark testing, results of code intercomparison were very satisfactory. From these testing results, it is concluded that the NUFT code is ready for application to field and laboratory problems similar to those addressed here. Multidimensional problems, including those dealing with chemical transport, will be addressed in a subsequent report.
Benchmarking nitrogen removal suspended-carrier biofilm systems using dynamic simulation.
Vanhooren, H; Yuan, Z; Vanrolleghem, P A
2002-01-01
We are witnessing an enormous growth in biological nitrogen removal from wastewater. It presents specific challenges beyond traditional COD (carbon) removal. A possibility for optimised process design is the use of biomass-supporting media. In this paper, attached growth processes (AGP) are evaluated using dynamic simulations. The advantages of these systems that were qualitatively described elsewhere, are validated quantitatively based on a simulation benchmark for activated sludge treatment systems. This simulation benchmark is extended with a biofilm model that allows for fast and accurate simulation of the conversion of different substrates in a biofilm. The economic feasibility of this system is evaluated using the data generated with the benchmark simulations. Capital savings due to volume reduction and reduced sludge production are weighed out against increased aeration costs. In this evaluation, effluent quality is integrated as well.
Computational Chemistry Comparison and Benchmark Database
National Institute of Standards and Technology Data Gateway
SRD 101 NIST Computational Chemistry Comparison and Benchmark Database (Web, free access) The NIST Computational Chemistry Comparison and Benchmark Database is a collection of experimental and ab initio thermochemical properties for a selected set of molecules. The goals are to provide a benchmark set of molecules for the evaluation of ab initio computational methods and allow the comparison between different ab initio computational methods for the prediction of thermochemical properties.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Thomas Martin; Celik, Cihangir; McMahan, Kimberly L.
This benchmark experiment was conducted as a joint venture between the US Department of Energy (DOE) and the French Commissariat à l'Energie Atomique (CEA). Staff at the Oak Ridge National Laboratory (ORNL) in the US and the Centre de Valduc in France planned this experiment. The experiment was conducted on October 19, 2010 in the SILENE critical assembly facility at Valduc. Several other organizations contributed to this experiment and the subsequent evaluation, including CEA Saclay, Lawrence Livermore National Laboratory (LLNL), the Y-12 National Security Complex (NSC), Babcock International Group in the United Kingdom, and Los Alamos National Laboratory (LANL). Themore » goal of this experiment was to measure neutron activation and thermoluminescent dosimeter (TLD) doses from a source similar to a fissile solution critical excursion. The resulting benchmark can be used for validation of computer codes and nuclear data libraries as required when performing analysis of criticality accident alarm systems (CAASs). A secondary goal of this experiment was to qualitatively test performance of two CAAS detectors similar to those currently and formerly in use in some US DOE facilities. The detectors tested were the CIDAS MkX and the Rocky Flats NCD-91. The CIDAS detects gammas with a Geiger-Muller tube and the Rocky Flats detects neutrons via charged particles produced in a thin 6LiF disc depositing energy in a Si solid state detector. These detectors were being evaluated to determine whether they would alarm, so they were not expected to generate benchmark quality data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Thomas Martin; Celik, Cihangir; Isbell, Kimberly McMahan
This benchmark experiment was conducted as a joint venture between the US Department of Energy (DOE) and the French Commissariat à l'Energie Atomique (CEA). Staff at the Oak Ridge National Laboratory (ORNL) in the US and the Centre de Valduc in France planned this experiment. The experiment was conducted on October 13, 2010 in the SILENE critical assembly facility at Valduc. Several other organizations contributed to this experiment and the subsequent evaluation, including CEA Saclay, Lawrence Livermore National Laboratory (LLNL), the Y-12 National Security Complex (NSC), Babcock International Group in the United Kingdom, and Los Alamos National Laboratory (LANL). Themore » goal of this experiment was to measure neutron activation and thermoluminescent dosimeter (TLD) doses from a source similar to a fissile solution critical excursion. The resulting benchmark can be used for validation of computer codes and nuclear data libraries as required when performing analysis of criticality accident alarm systems (CAASs). A secondary goal of this experiment was to qualitatively test performance of two CAAS detectors similar to those currently and formerly in use in some US DOE facilities. The detectors tested were the CIDAS MkX and the Rocky Flats NCD-91. The CIDAS detects gammas with a Geiger-Muller tube, and the Rocky Flats detects neutrons via charged particles produced in a thin 6LiF disc, depositing energy in a Si solid-state detector. These detectors were being evaluated to determine whether they would alarm, so they were not expected to generate benchmark quality data.« less
Benchmarking: measuring the outcomes of evidence-based practice.
DeLise, D C; Leasure, A R
2001-01-01
Measurement of the outcomes associated with implementation of evidence-based practice changes is becoming increasingly emphasized by multiple health care disciplines. A final step to the process of implementing and sustaining evidence-supported practice changes is that of outcomes evaluation and monitoring. The comparison of outcomes to internal and external measures is known as benchmarking. This article discusses evidence-based practice, provides an overview of outcomes evaluation, and describes the process of benchmarking to improve practice. A case study is used to illustrate this concept.
NASA Astrophysics Data System (ADS)
Kahler, A. C.; MacFarlane, R. E.; Mosteller, R. D.; Kiedrowski, B. C.; Frankle, S. C.; Chadwick, M. B.; McKnight, R. D.; Lell, R. M.; Palmiotti, G.; Hiruta, H.; Herman, M.; Arcilla, R.; Mughabghab, S. F.; Sublet, J. C.; Trkov, A.; Trumbull, T. H.; Dunn, M.
2011-12-01
The ENDF/B-VII.1 library is the latest revision to the United States' Evaluated Nuclear Data File (ENDF). The ENDF library is currently in its seventh generation, with ENDF/B-VII.0 being released in 2006. This revision expands upon that library, including the addition of new evaluated files (was 393 neutron files previously, now 423 including replacement of elemental vanadium and zinc evaluations with isotopic evaluations) and extension or updating of many existing neutron data files. Complete details are provided in the companion paper [M. B. Chadwick et al., "ENDF/B-VII.1 Nuclear Data for Science and Technology: Cross Sections, Covariances, Fission Product Yields and Decay Data," Nuclear Data Sheets, 112, 2887 (2011)]. This paper focuses on how accurately application libraries may be expected to perform in criticality calculations with these data. Continuous energy cross section libraries, suitable for use with the MCNP Monte Carlo transport code, have been generated and applied to a suite of nearly one thousand critical benchmark assemblies defined in the International Criticality Safety Benchmark Evaluation Project's International Handbook of Evaluated Criticality Safety Benchmark Experiments. This suite covers uranium and plutonium fuel systems in a variety of forms such as metallic, oxide or solution, and under a variety of spectral conditions, including unmoderated (i.e., bare), metal reflected and water or other light element reflected. Assembly eigenvalues that were accurately predicted with ENDF/B-VII.0 cross sections such as unmoderated and uranium reflected 235U and 239Pu assemblies, HEU solution systems and LEU oxide lattice systems that mimic commercial PWR configurations continue to be accurately calculated with ENDF/B-VII.1 cross sections, and deficiencies in predicted eigenvalues for assemblies containing selected materials, including titanium, manganese, cadmium and tungsten are greatly reduced. Improvements are also confirmed for selected actinide reaction rates such as 236U, 238,242Pu and 241,243Am capture in fast systems. Other deficiencies, such as the overprediction of Pu solution system critical eigenvalues and a decreasing trend in calculated eigenvalue for 233U fueled systems as a function of Above-Thermal Fission Fraction remain. The comprehensive nature of this critical benchmark suite and the generally accurate calculated eigenvalues obtained with ENDF/B-VII.1 neutron cross sections support the conclusion that this is the most accurate general purpose ENDF/B cross section library yet released to the technical community.
Benchmarking and the laboratory
Galloway, M; Nadin, L
2001-01-01
This article describes how benchmarking can be used to assess laboratory performance. Two benchmarking schemes are reviewed, the Clinical Benchmarking Company's Pathology Report and the College of American Pathologists' Q-Probes scheme. The Clinical Benchmarking Company's Pathology Report is undertaken by staff based in the clinical management unit, Keele University with appropriate input from the professional organisations within pathology. Five annual reports have now been completed. Each report is a detailed analysis of 10 areas of laboratory performance. In this review, particular attention is focused on the areas of quality, productivity, variation in clinical practice, skill mix, and working hours. The Q-Probes scheme is part of the College of American Pathologists programme in studies of quality assurance. The Q-Probes scheme and its applicability to pathology in the UK is illustrated by reviewing two recent Q-Probe studies: routine outpatient test turnaround time and outpatient test order accuracy. The Q-Probes scheme is somewhat limited by the small number of UK laboratories that have participated. In conclusion, as a result of the government's policy in the UK, benchmarking is here to stay. Benchmarking schemes described in this article are one way in which pathologists can demonstrate that they are providing a cost effective and high quality service. Key Words: benchmarking • pathology PMID:11477112
PFLOTRAN Verification: Development of a Testing Suite to Ensure Software Quality
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Frederick, J. M.
2016-12-01
In scientific computing, code verification ensures the reliability and numerical accuracy of a model simulation by comparing the simulation results to experimental data or known analytical solutions. The model is typically defined by a set of partial differential equations with initial and boundary conditions, and verification ensures whether the mathematical model is solved correctly by the software. Code verification is especially important if the software is used to model high-consequence systems which cannot be physically tested in a fully representative environment [Oberkampf and Trucano (2007)]. Justified confidence in a particular computational tool requires clarity in the exercised physics and transparency in its verification process with proper documentation. We present a quality assurance (QA) testing suite developed by Sandia National Laboratories that performs code verification for PFLOTRAN, an open source, massively-parallel subsurface simulator. PFLOTRAN solves systems of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport processes in porous media. PFLOTRAN's QA test suite compares the numerical solutions of benchmark problems in heat and mass transport against known, closed-form, analytical solutions, including documentation of the exercised physical process models implemented in each PFLOTRAN benchmark simulation. The QA test suite development strives to follow the recommendations given by Oberkampf and Trucano (2007), which describes four essential elements in high-quality verification benchmark construction: (1) conceptual description, (2) mathematical description, (3) accuracy assessment, and (4) additional documentation and user information. Several QA tests within the suite will be presented, including details of the benchmark problems and their closed-form analytical solutions, implementation of benchmark problems in PFLOTRAN simulations, and the criteria used to assess PFLOTRAN's performance in the code verification procedure. References Oberkampf, W. L., and T. G. Trucano (2007), Verification and Validation Benchmarks, SAND2007-0853, 67 pgs., Sandia National Laboratories, Albuquerque, NM.
Benchmarks for Psychotherapy Efficacy in Adult Major Depression
ERIC Educational Resources Information Center
Minami, Takuya; Wampold, Bruce E.; Serlin, Ronald C.; Kircher, John C.; Brown, George S.
2007-01-01
This study estimates pretreatment-posttreatment effect size benchmarks for the treatment of major depression in adults that may be useful in evaluating psychotherapy effectiveness in clinical practice. Treatment efficacy benchmarks for major depression were derived for 3 different types of outcome measures: the Hamilton Rating Scale for Depression…
BENCHMARK DOSES FOR CHEMICAL MIXTURES: EVALUATION OF A MIXTURE OF 18 PHAHS.
Benchmark doses (BMDs), defined as doses of a substance that are expected to result in a pre-specified level of "benchmark" response (BMR), have been used for quantifying the risk associated with exposure to environmental hazards. The lower confidence limit of the BMD is used as...
The Suite for Embedded Applications and Kernels
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-05-10
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We havedesigned SEAK, a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions to these bottlenecks? and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) andgoal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user blackbox evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informativemore » for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.« less
A review on the benchmarking concept in Malaysian construction safety performance
NASA Astrophysics Data System (ADS)
Ishak, Nurfadzillah; Azizan, Muhammad Azizi
2018-02-01
Construction industry is one of the major industries that propels Malaysia's economy in highly contributes to our nation's GDP growth, yet the high fatality rates on construction sites have caused concern among safety practitioners and the stakeholders. Hence, there is a need of benchmarking in performance of Malaysia's construction industry especially in terms of safety. This concept can create a fertile ground for ideas, but only in a receptive environment, organization that share good practices and compare their safety performance against other benefit most to establish improvement in safety culture. This research was conducted to study the awareness important, evaluate current practice and improvement, and also identify the constraint in implement of benchmarking on safety performance in our industry. Additionally, interviews with construction professionals were come out with different views on this concept. Comparison has been done to show the different understanding of benchmarking approach and how safety performance can be benchmarked. But, it's viewed as one mission, which to evaluate objectives identified through benchmarking that will improve the organization's safety performance. Finally, the expected result from this research is to help Malaysia's construction industry implement best practice in safety performance management through the concept of benchmarking.
Benchmarking reference services: an introduction.
Marshall, J G; Buchanan, H S
1995-01-01
Benchmarking is based on the common sense idea that someone else, either inside or outside of libraries, has found a better way of doing certain things and that your own library's performance can be improved by finding out how others do things and adopting the best practices you find. Benchmarking is one of the tools used for achieving continuous improvement in Total Quality Management (TQM) programs. Although benchmarking can be done on an informal basis, TQM puts considerable emphasis on formal data collection and performance measurement. Used to its full potential, benchmarking can provide a common measuring stick to evaluate process performance. This article introduces the general concept of benchmarking, linking it whenever possible to reference services in health sciences libraries. Data collection instruments that have potential application in benchmarking studies are discussed and the need to develop common measurement tools to facilitate benchmarking is emphasized.
ERIC Educational Resources Information Center
Canadian Health Libraries Association.
Nine Canadian health libraries participated in a pilot test of the Benchmarking Tool Kit between January and April, 1998. Although the Tool Kit was designed specifically for health libraries, the content and approach are useful to other types of libraries as well. Used to its full potential, benchmarking can provide a common measuring stick to…
NASA Technical Reports Server (NTRS)
Leger, Lubert J.; Koontz, Steven L.; Visentine, James T.; Hunton, Donald
1993-01-01
An overview of EOIM-III, designed to produce benchmark atomic oxygen reactivity data is presented. Ambient density measurements are conducted using a quadrupole mass spectrometer calibrated for atomic oxygen measurements in a unique ground-based test facility. The combination of these data with the predictions of ambient density models permits an assessment of the accuracy of measured reaction rates on a variety of materials, many of which have never been tested in LEO previously.
Mo-Si-B Alloys and Diboride Systems for High Enthalpy Environments: Design and Evaluation
2016-01-15
candidate material species production over a range of test gas enthalpies and pressures for UWM and ISU samples. Year 3: 3.1 Begin FTIR...emission measurements on CO2-laser heated samples at SRI. 3.2 Continue experiments to optimize Si-, B-, and C-species LIF detection schemes in hot gas ...material tests to identify data that can be used to benchmark development of physics-based models of gas -surface interactions. • Employ the
ERIC Educational Resources Information Center
Kinnell, Margaret; Garrod, Penny
This British Library Research and Development Department study assesses current activities and attitudes toward quality management in library and information services (LIS) in the academic sector as well as the commercial/industrial sector. Definitions and types of benchmarking are described, and the relevance of benchmarking to LIS is evaluated.…
Application of Shape Similarity in Pose Selection and Virtual Screening in CSARdock2014 Exercise.
Kumar, Ashutosh; Zhang, Kam Y J
2016-06-27
To evaluate the applicability of shape similarity in docking-based pose selection and virtual screening, we participated in the CSARdock2014 benchmark exercise for identifying the correct docking pose of inhibitors targeting factor XA, spleen tyrosine kinase, and tRNA methyltransferase. This exercise provides a valuable opportunity for researchers to test their docking programs, methods, and protocols in a blind testing environment. In the CSARdock2014 benchmark exercise, we have implemented an approach that uses ligand 3D shape similarity to facilitate docking-based pose selection and virtual screening. We showed here that ligand 3D shape similarity between bound poses could be used to identify the native-like pose from an ensemble of docking-generated poses. Our method correctly identified the native pose as the top-ranking pose for 73% of test cases in a blind testing environment. Moreover, the pose selection results also revealed an excellent correlation between ligand 3D shape similarity scores and RMSD to X-ray crystal structure ligand. In the virtual screening exercise, the average RMSD for our pose prediction was found to be 1.02 Å, and it was one of the top performances achieved in CSARdock2014 benchmark exercise. Furthermore, the inclusion of shape similarity improved virtual screening performance of docking-based scoring and ranking. The coefficient of determination (r(2)) between experimental activities and docking scores for 276 spleen tyrosine kinase inhibitors was found to be 0.365 but reached 0.614 when the ligand 3D shape similarity was included.
Benchmarking an Unstructured-Grid Model for Tsunami Current Modeling
NASA Astrophysics Data System (ADS)
Zhang, Yinglong J.; Priest, George; Allan, Jonathan; Stimely, Laura
2016-12-01
We present model results derived from a tsunami current benchmarking workshop held by the NTHMP (National Tsunami Hazard Mitigation Program) in February 2015. Modeling was undertaken using our own 3D unstructured-grid model that has been previously certified by the NTHMP for tsunami inundation. Results for two benchmark tests are described here, including: (1) vortex structure in the wake of a submerged shoal and (2) impact of tsunami waves on Hilo Harbor in the 2011 Tohoku event. The modeled current velocities are compared with available lab and field data. We demonstrate that the model is able to accurately capture the velocity field in the two benchmark tests; in particular, the 3D model gives a much more accurate wake structure than the 2D model for the first test, with the root-mean-square error and mean bias no more than 2 cm s-1 and 8 mm s-1, respectively, for the modeled velocity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sample, B.E. Opresko, D.M. Suter, G.W.
Ecological risks of environmental contaminants are evaluated by using a two-tiered process. In the first tier, a screening assessment is performed where concentrations of contaminants in the environment are compared to no observed adverse effects level (NOAEL)-based toxicological benchmarks. These benchmarks represent concentrations of chemicals (i.e., concentrations presumed to be nonhazardous to the biota) in environmental media (water, sediment, soil, food, etc.). While exceedance of these benchmarks does not indicate any particular level or type of risk, concentrations below the benchmarks should not result in significant effects. In practice, when contaminant concentrations in food or water resources are less thanmore » these toxicological benchmarks, the contaminants may be excluded from further consideration. However, if the concentration of a contaminant exceeds a benchmark, that contaminant should be retained as a contaminant of potential concern (COPC) and investigated further. The second tier in ecological risk assessment, the baseline ecological risk assessment, may use toxicological benchmarks as part of a weight-of-evidence approach (Suter 1993). Under this approach, based toxicological benchmarks are one of several lines of evidence used to support or refute the presence of ecological effects. Other sources of evidence include media toxicity tests, surveys of biota (abundance and diversity), measures of contaminant body burdens, and biomarkers. This report presents NOAEL- and lowest observed adverse effects level (LOAEL)-based toxicological benchmarks for assessment of effects of 85 chemicals on 9 representative mammalian wildlife species (short-tailed shrew, little brown bat, meadow vole, white-footed mouse, cottontail rabbit, mink, red fox, and whitetail deer) or 11 avian wildlife species (American robin, rough-winged swallow, American woodcock, wild turkey, belted kingfisher, great blue heron, barred owl, barn owl, Cooper's hawk, and red-tailed hawk, osprey) (scientific names for both the mammalian and avian species are presented in Appendix B). [In this document, NOAEL refers to both dose (mg contaminant per kg animal body weight per day) and concentration (mg contaminant per kg of food or L of drinking water)]. The 20 wildlife species were chosen because they are widely distributed and provide a representative range of body sizes and diets. The chemicals are some of those that occur at U.S. Department of Energy (DOE) waste sites. The NOAEL-based benchmarks presented in this report represent values believed to be nonhazardous for the listed wildlife species; LOAEL-based benchmarks represent threshold levels at which adverse effects are likely to become evident. These benchmarks consider contaminant exposure through oral ingestion of contaminated media only. Exposure through inhalation and/or direct dermal exposure are not considered in this report.« less
NASA Astrophysics Data System (ADS)
Velioglu Sogut, Deniz; Yalciner, Ahmet Cevdet
2018-06-01
Field observations provide valuable data regarding nearshore tsunami impact, yet only in inundation areas where tsunami waves have already flooded. Therefore, tsunami modeling is essential to understand tsunami behavior and prepare for tsunami inundation. It is necessary that all numerical models used in tsunami emergency planning be subject to benchmark tests for validation and verification. This study focuses on two numerical codes, NAMI DANCE and FLOW-3D®, for validation and performance comparison. NAMI DANCE is an in-house tsunami numerical model developed by the Ocean Engineering Research Center of Middle East Technical University, Turkey and Laboratory of Special Research Bureau for Automation of Marine Research, Russia. FLOW-3D® is a general purpose computational fluid dynamics software, which was developed by scientists who pioneered in the design of the Volume-of-Fluid technique. The codes are validated and their performances are compared via analytical, experimental and field benchmark problems, which are documented in the ``Proceedings and Results of the 2011 National Tsunami Hazard Mitigation Program (NTHMP) Model Benchmarking Workshop'' and the ``Proceedings and Results of the NTHMP 2015 Tsunami Current Modeling Workshop". The variations between the numerical solutions of these two models are evaluated through statistical error analysis.
Thermal Performance Benchmarking: Annual Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreno, Gilbert
2016-04-08
The goal for this project is to thoroughly characterize the performance of state-of-the-art (SOA) automotive power electronics and electric motor thermal management systems. Information obtained from these studies will be used to: Evaluate advantages and disadvantages of different thermal management strategies; establish baseline metrics for the thermal management systems; identify methods of improvement to advance the SOA; increase the publicly available information related to automotive traction-drive thermal management systems; help guide future electric drive technologies (EDT) research and development (R&D) efforts. The performance results combined with component efficiency and heat generation information obtained by Oak Ridge National Laboratory (ORNL) maymore » then be used to determine the operating temperatures for the EDT components under drive-cycle conditions. In FY15, the 2012 Nissan LEAF power electronics and electric motor thermal management systems were benchmarked. Testing of the 2014 Honda Accord Hybrid power electronics thermal management system started in FY15; however, due to time constraints it was not possible to include results for this system in this report. The focus of this project is to benchmark the thermal aspects of the systems. ORNL's benchmarking of electric and hybrid electric vehicle technology reports provide detailed descriptions of the electrical and packaging aspects of these automotive systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greiner, Miles
Radial hydride formation in high-burnup used fuel cladding has the potential to radically reduce its ductility and suitability for long-term storage and eventual transport. To avoid this formation, the maximum post-reactor temperature must remain sufficiently low to limit the cladding hoop stress, and so that hydrogen from the existing circumferential hydrides will not dissolve and become available to re-precipitate into radial hydrides under the slow cooling conditions during drying, transfer and early dry-cask storage. The objective of this research is to develop and experimentallybenchmark computational fluid dynamics simulations of heat transfer in post-pool-storage drying operations, when high-burnup fuel cladding ismore » likely to experience its highest temperature. These benchmarked tools can play a key role in evaluating dry cask storage systems for extended storage of high-burnup fuels and post-storage transportation, including fuel retrievability. The benchmarked tools will be used to aid the design of efficient drying processes, as well as estimate variations of surface temperatures as a means of inferring helium integrity inside the canister or cask. This work will be conducted effectively because the principal investigator has experience developing these types of simulations, and has constructed a test facility that can be used to benchmark them.« less
Global Gridded Crop Model Evaluation: Benchmarking, Skills, Deficiencies and Implications.
NASA Technical Reports Server (NTRS)
Muller, Christoph; Elliott, Joshua; Chryssanthacopoulos, James; Arneth, Almut; Balkovic, Juraj; Ciais, Philippe; Deryng, Delphine; Folberth, Christian; Glotter, Michael; Hoek, Steven;
2017-01-01
Crop models are increasingly used to simulate crop yields at the global scale, but so far there is no general framework on how to assess model performance. Here we evaluate the simulation results of 14 global gridded crop modeling groups that have contributed historic crop yield simulations for maize, wheat, rice and soybean to the Global Gridded Crop Model Intercomparison (GGCMI) of the Agricultural Model Intercomparison and Improvement Project (AgMIP). Simulation results are compared to reference data at global, national and grid cell scales and we evaluate model performance with respect to time series correlation, spatial correlation and mean bias. We find that global gridded crop models (GGCMs) show mixed skill in reproducing time series correlations or spatial patterns at the different spatial scales. Generally, maize, wheat and soybean simulations of many GGCMs are capable of reproducing larger parts of observed temporal variability (time series correlation coefficients (r) of up to 0.888 for maize, 0.673 for wheat and 0.643 for soybean at the global scale) but rice yield variability cannot be well reproduced by most models. Yield variability can be well reproduced for most major producing countries by many GGCMs and for all countries by at least some. A comparison with gridded yield data and a statistical analysis of the effects of weather variability on yield variability shows that the ensemble of GGCMs can explain more of the yield variability than an ensemble of regression models for maize and soybean, but not for wheat and rice. We identify future research needs in global gridded crop modeling and for all individual crop modeling groups. In the absence of a purely observation-based benchmark for model evaluation, we propose that the best performing crop model per crop and region establishes the benchmark for all others, and modelers are encouraged to investigate how crop model performance can be increased. We make our evaluation system accessible to all crop modelers so that other modeling groups can also test their model performance against the reference data and the GGCMI benchmark.
DeltaSA tool for source apportionment benchmarking, description and sensitivity analysis
NASA Astrophysics Data System (ADS)
Pernigotti, D.; Belis, C. A.
2018-05-01
DeltaSA is an R-package and a Java on-line tool developed at the EC-Joint Research Centre to assist and benchmark source apportionment applications. Its key functionalities support two critical tasks in this kind of studies: the assignment of a factor to a source in factor analytical models (source identification) and the model performance evaluation. The source identification is based on the similarity between a given factor and source chemical profiles from public databases. The model performance evaluation is based on statistical indicators used to compare model output with reference values generated in intercomparison exercises. The references values are calculated as the ensemble average of the results reported by participants that have passed a set of testing criteria based on chemical profiles and time series similarity. In this study, a sensitivity analysis of the model performance criteria is accomplished using the results of a synthetic dataset where "a priori" references are available. The consensus modulated standard deviation punc gives the best choice for the model performance evaluation when a conservative approach is adopted.
Experimental unsteady pressures at flutter on the Supercritical Wing Benchmark Model
NASA Technical Reports Server (NTRS)
Dansberry, Bryan E.; Durham, Michael H.; Bennett, Robert M.; Rivera, Jose A.; Silva, Walter A.; Wieseman, Carol D.; Turnock, David L.
1993-01-01
This paper describes selected results from the flutter testing of the Supercritical Wing (SW) model. This model is a rigid semispan wing having a rectangular planform and a supercritical airfoil shape. The model was flutter tested in the Langley Transonic Dynamics Tunnel (TDT) as part of the Benchmark Models Program, a multi-year wind tunnel activity currently being conducted by the Structural Dynamics Division of NASA Langley Research Center. The primary objective of this program is to assist in the development and evaluation of aeroelastic computational fluid dynamics codes. The SW is the second of a series of three similar models which are designed to be flutter tested in the TDT on a flexible mount known as the Pitch and Plunge Apparatus. Data sets acquired with these models, including simultaneous unsteady surface pressures and model response data, are meant to be used for correlation with analytical codes. Presented in this report are experimental flutter boundaries and corresponding steady and unsteady pressure distribution data acquired over two model chords located at the 60 and 95 percent span stations.
ERIC Educational Resources Information Center
Olney, Cynthia A.; Chumley, Heidi; Parra, Juan M.
2004-01-01
A team designing a Web-enhanced third-year medical education didactic curriculum based their course planning and evaluation activities on the Institute for Higher Education Policy's (2000) 24 benchmarks for online distance learning. The authors present the team's blueprint for planning and evaluating the Web-enhanced curriculum, which incorporates…
Coreference Resolution With Reconcile
2010-07-01
evaluation of coreference re- solvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile and present experimental... scores vary wildly across data sets, evaluation metrics, and system configurations. We believe that one root cause of these dispar- ities is the high...resolution and empirical evaluation of coreference resolvers across a variety of benchmark data sets and standard scoring metrics. We describe Reconcile
Outcome Benchmarks for Adaptations of Research-Supported Treatments for Adult Traumatic Stress
ERIC Educational Resources Information Center
Rubin, Allen; Parrish, Danielle E.; Washburn, Micki
2016-01-01
This article provides benchmark data on within-group effect sizes from published randomized controlled trials (RCTs) that evaluated the efficacy of research-supported treatments (RSTs) for adult traumatic stress. Agencies can compare these benchmarks to their treatment group effect size to inform their decisions as to whether the way they are…
IT-benchmarking of clinical workflows: concept, implementation, and evaluation.
Thye, Johannes; Straede, Matthias-Christopher; Liebe, Jan-David; Hübner, Ursula
2014-01-01
Due to the emerging evidence of health IT as opportunity and risk for clinical workflows, health IT must undergo a continuous measurement of its efficacy and efficiency. IT-benchmarks are a proven means for providing this information. The aim of this study was to enhance the methodology of an existing benchmarking procedure by including, in particular, new indicators of clinical workflows and by proposing new types of visualisation. Drawing on the concept of information logistics, we propose four workflow descriptors that were applied to four clinical processes. General and specific indicators were derived from these descriptors and processes. 199 chief information officers (CIOs) took part in the benchmarking. These hospitals were assigned to reference groups of a similar size and ownership from a total of 259 hospitals. Stepwise and comprehensive feedback was given to the CIOs. Most participants who evaluated the benchmark rated the procedure as very good, good, or rather good (98.4%). Benchmark information was used by CIOs for getting a general overview, advancing IT, preparing negotiations with board members, and arguing for a new IT project.
Benchmark Evaluation of the HTR-PROTEUS Absorber Rod Worths (Core 4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess; Leland M. Montierth
2014-06-01
PROTEUS was a zero-power research reactor at the Paul Scherrer Institute (PSI) in Switzerland. The critical assembly was constructed from a large graphite annulus surrounding a central cylindrical cavity. Various experimental programs were investigated in PROTEUS; during the years 1992 through 1996, it was configured as a pebble-bed reactor and designated HTR-PROTEUS. Various critical configurations were assembled with each accompanied by an assortment of reactor physics experiments including differential and integral absorber rod measurements, kinetics, reaction rate distributions, water ingress effects, and small sample reactivity effects [1]. Four benchmark reports were previously prepared and included in the March 2013 editionmore » of the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook) [2] evaluating eleven critical configurations. A summary of that effort was previously provided [3] and an analysis of absorber rod worth measurements for Cores 9 and 10 have been performed prior to this analysis and included in PROTEUS-GCR-EXP-004 [4]. In the current benchmark effort, absorber rod worths measured for Core Configuration 4, which was the only core with a randomly-packed pebble loading, have been evaluated for inclusion as a revision to the HTR-PROTEUS benchmark report PROTEUS-GCR-EXP-002.« less
Neil, Amanda; Pfeffer, Sally; Burnett, Leslie
2013-01-01
This paper details the development of a new type of pathology laboratory productivity unit, the benchmarking complexity unit (BCU). The BCU provides a comparative index of laboratory efficiency, regardless of test mix. It also enables estimation of a measure of how much complex pathology a laboratory performs, and the identification of peer organisations for the purposes of comparison and benchmarking. The BCU is based on the theory that wage rates reflect productivity at the margin. A weighting factor for the ratio of medical to technical staff time was dynamically calculated based on actual participant site data. Given this weighting, a complexity value for each test, at each site, was calculated. The median complexity value (number of BCUs) for that test across all participating sites was taken as its complexity value for the Benchmarking in Pathology Program. The BCU allowed implementation of an unbiased comparison unit and test listing that was found to be a robust indicator of the relative complexity for each test. Employing the BCU data, a number of Key Performance Indicators (KPIs) were developed, including three that address comparative organisational complexity, analytical depth and performance efficiency, respectively. Peer groups were also established using the BCU combined with simple organisational and environmental metrics. The BCU has enabled productivity statistics to be compared between organisations. The BCU corrects for differences in test mix and workload complexity of different organisations and also allows for objective stratification into peer groups.
Aeroelasticity Benchmark Assessment: Subsonic Fixed Wing Program
NASA Technical Reports Server (NTRS)
Florance, Jennifer P.; Chwalowski, Pawel; Wieseman, Carol D.
2010-01-01
The fundamental technical challenge in computational aeroelasticity is the accurate prediction of unsteady aerodynamic phenomena and the effect on the aeroelastic response of a vehicle. Currently, a benchmarking standard for use in validating the accuracy of computational aeroelasticity codes does not exist. Many aeroelastic data sets have been obtained in wind-tunnel and flight testing throughout the world; however, none have been globally presented or accepted as an ideal data set. There are numerous reasons for this. One reason is that often, such aeroelastic data sets focus on the aeroelastic phenomena alone (flutter, for example) and do not contain associated information such as unsteady pressures and time-correlated structural dynamic deflections. Other available data sets focus solely on the unsteady pressures and do not address the aeroelastic phenomena. Other discrepancies can include omission of relevant data, such as flutter frequency and / or the acquisition of only qualitative deflection data. In addition to these content deficiencies, all of the available data sets present both experimental and computational technical challenges. Experimental issues include facility influences, nonlinearities beyond those being modeled, and data processing. From the computational perspective, technical challenges include modeling geometric complexities, coupling between the flow and the structure, grid issues, and boundary conditions. The Aeroelasticity Benchmark Assessment task seeks to examine the existing potential experimental data sets and ultimately choose the one that is viewed as the most suitable for computational benchmarking. An initial computational evaluation of that configuration will then be performed using the Langley-developed computational fluid dynamics (CFD) software FUN3D1 as part of its code validation process. In addition to the benchmarking activity, this task also includes an examination of future research directions. Researchers within the Aeroelasticity Branch will examine other experimental efforts within the Subsonic Fixed Wing (SFW) program (such as testing of the NASA Common Research Model (CRM)) and other NASA programs and assess aeroelasticity issues and research topics.
Benchmarking for the Effective Use of Student Evaluation Data
ERIC Educational Resources Information Center
Smithson, John; Birks, Melanie; Harrison, Glenn; Nair, Chenicheri Sid; Hitchins, Marnie
2015-01-01
Purpose: The purpose of this paper is to examine current approaches to interpretation of student evaluation data and present an innovative approach to developing benchmark targets for the effective and efficient use of these data. Design/Methodology/Approach: This article discusses traditional approaches to gathering and using student feedback…
Benchmarks for the Dichotic Sentence Identification test in Brazilian Portuguese for ear and age.
Andrade, Adriana Neves de; Gil, Daniela; Iorio, Maria Cecilia Martinelli
2015-01-01
Dichotic listening tests should be used in local languages and adapted for the population. Standardize the Brazilian Portuguese version of the Dichotic Sentence Identification test in normal listeners, comparing the performance for age and ear. This prospective study included 200 normal listeners divided into four groups according to age: 13-19 years (GI), 20-29 years (GII), 30-39 years (GIII), and 40-49 years (GIV). The Dichotic Sentence Identification was applied in four stages: training, binaural integration and directed sound from right and left. Better results for the right ear were observed in the stages of binaural integration in all assessed groups. There was a negative correlation between age and percentage of correct responses in both ears for free report and training. The worst performance in all stages of the test was observed for the age group 40-49 years old. Reference values for the Brazilian Portuguese version of the Dichotic Sentence Identification test in normal listeners aged 13-49 years were established according to age, ear, and test stage; they should be used as benchmarks when evaluating individuals with these characteristics. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Brown, Nicholas R.; Carlsen, Brett W.; Dixon, Brent W.; ...
2016-06-09
Dynamic fuel cycle simulation tools are intended to model holistic transient nuclear fuel cycle scenarios. As with all simulation tools, fuel cycle simulators require verification through unit tests, benchmark cases, and integral tests. Model validation is a vital aspect as well. Although compara-tive studies have been performed, there is no comprehensive unit test and benchmark library for fuel cycle simulator tools. The objective of this paper is to identify the must test functionalities of a fuel cycle simulator tool within the context of specific problems of interest to the Fuel Cycle Options Campaign within the U.S. Department of Energy smore » Office of Nuclear Energy. The approach in this paper identifies the features needed to cover the range of promising fuel cycle options identified in the DOE-NE Fuel Cycle Evaluation and Screening (E&S) and categorizes these features to facilitate prioritization. Features were categorized as essential functions, integrating features, and exemplary capabilities. One objective of this paper is to propose a library of unit tests applicable to each of the essential functions. Another underlying motivation for this paper is to encourage an international dialog on the functionalities and standard test methods for fuel cycle simulator tools.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arnis Judzis
2002-10-01
This document details the progress to date on the OPTIMIZATION OF MUD HAMMER DRILLING PERFORMANCE -- A PROGRAM TO BENCHMARK THE VIABILITY OF ADVANCED MUD HAMMER DRILLING contract for the quarter starting July 2002 through September 2002. Even though we are awaiting the optimization portion of the testing program, accomplishments include the following: (1) Smith International agreed to participate in the DOE Mud Hammer program. (2) Smith International chromed collars for upcoming benchmark tests at TerraTek, now scheduled for 4Q 2002. (3) ConocoPhillips had a field trial of the Smith fluid hammer offshore Vietnam. The hammer functioned properly, though themore » well encountered hole conditions and reaming problems. ConocoPhillips plan another field trial as a result. (4) DOE/NETL extended the contract for the fluid hammer program to allow Novatek to ''optimize'' their much delayed tool to 2003 and to allow Smith International to add ''benchmarking'' tests in light of SDS Digger Tools' current financial inability to participate. (5) ConocoPhillips joined the Industry Advisors for the mud hammer program. (6) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to complete the optimizations.« less
Method and system for benchmarking computers
Gustafson, John L.
1993-09-14
A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
Proposal of an innovative benchmark for comparison of the performance of contactless digitizers
NASA Astrophysics Data System (ADS)
Iuliano, Luca; Minetola, Paolo; Salmi, Alessandro
2010-10-01
Thanks to the improving performances of 3D optical scanners, in terms of accuracy and repeatability, reverse engineering applications have extended from CAD model design or reconstruction to quality control. Today, contactless digitizing devices constitute a good alternative to coordinate measuring machines (CMMs) for the inspection of certain parts. The German guideline VDI/VDE 2634 is the only reference to evaluate whether 3D optical measuring systems comply with the declared or required performance specifications. Nevertheless it is difficult to compare the performance of different scanners referring to such a guideline. An adequate novel benchmark is proposed in this paper: focusing on the inspection of production tools (moulds), the innovative test piece was designed using common geometries and free-form surfaces. The reference part is intended to be employed for the evaluation of the performance of several contactless digitizing devices in computer-aided inspection, considering dimensional and geometrical tolerances as well as other quantitative and qualitative criteria.
Similarity indices of meteo-climatic gauging stations: definition and comparison.
Barca, Emanuele; Bruno, Delia Evelina; Passarella, Giuseppe
2016-07-01
Space-time dependencies among monitoring network stations have been investigated to detect and quantify similarity relationships among gauging stations. In this work, besides the well-known rank correlation index, two new similarity indices have been defined and applied to compute the similarity matrix related to the Apulian meteo-climatic monitoring network. The similarity matrices can be applied to address reliably the issue of missing data in space-time series. In order to establish the effectiveness of the similarity indices, a simulation test was then designed and performed with the aim of estimating missing monthly rainfall rates in a suitably selected gauging station. The results of the simulation allowed us to evaluate the effectiveness of the proposed similarity indices. Finally, the multiple imputation by chained equations method was used as a benchmark to have an absolute yardstick for comparing the outcomes of the test. In conclusion, the new proposed multiplicative similarity index resulted at least as reliable as the selected benchmark.
High Density Aerial Image Matching: State-Of and Future Prospects
NASA Astrophysics Data System (ADS)
Haala, N.; Cavegn, S.
2016-06-01
Ongoing innovations in matching algorithms are continuously improving the quality of geometric surface representations generated automatically from aerial images. This development motivated the launch of the joint ISPRS/EuroSDR project "Benchmark on High Density Aerial Image Matching", which aims on the evaluation of photogrammetric 3D data capture in view of the current developments in dense multi-view stereo-image matching. Originally, the test aimed on image based DSM computation from conventional aerial image flights for different landuse and image block configurations. The second phase then put an additional focus on high quality, high resolution 3D geometric data capture in complex urban areas. This includes both the extension of the test scenario to oblique aerial image flights as well as the generation of filtered point clouds as additional output of the respective multi-view reconstruction. The paper uses the preliminary outcomes of the benchmark to demonstrate the state-of-the-art in airborne image matching with a special focus of high quality geometric data capture in urban scenarios.
Space station operating system study
NASA Technical Reports Server (NTRS)
Horn, Albert E.; Harwell, Morris C.
1988-01-01
The current phase of the Space Station Operating System study is based on the analysis, evaluation, and comparison of the operating systems implemented on the computer systems and workstations in the software development laboratory. Primary emphasis has been placed on the DEC MicroVMS operating system as implemented on the MicroVax II computer, with comparative analysis of the SUN UNIX system on the SUN 3/260 workstation computer, and to a limited extent, the IBM PC/AT microcomputer running PC-DOS. Some benchmark development and testing was also done for the Motorola MC68010 (VM03 system) before the system was taken from the laboratory. These systems were studied with the objective of determining their capability to support Space Station software development requirements, specifically for multi-tasking and real-time applications. The methodology utilized consisted of development, execution, and analysis of benchmark programs and test software, and the experimentation and analysis of specific features of the system or compilers in the study.
Facility Energy Performance Benchmarking in a Data-Scarce Environment
2017-08-01
environment, and analyze occupant-, system-, and component-level faults contributing to energy in- efficiency. A methodology for developing DoD-specific...Research, Development, Test, and Evaluation (RDTE) Program to develop an intelligent framework, encompassing methodology and model- ing, that...energy performers by installation, climate zone, and other criteria. A methodology for creating the DoD-specific EUIs would be an important part of a
Shuttle Main Propulsion System LH2 Feed Line and Inducer Simulations
NASA Technical Reports Server (NTRS)
Dorney, Daniel J.; Rothermel, Jeffry
2002-01-01
This viewgraph presentation includes simulations of the unsteady flow field in the LH2 feed line, flow line, flow liner, backing cavity and inducer of Shuttle engine #1. It also evaluates aerodynamic forcing functions which may contribute to the formation of the cracks observed on the flow liner slots. The presentation lists the numerical methods used, and profiles a benchmark test case.
ERIC Educational Resources Information Center
Kurtz, Kenneth J.; Levering, Kimery R.; Stanton, Roger D.; Romero, Joshua; Morris, Steven N.
2013-01-01
The findings of Shepard, Hovland, and Jenkins (1961) on the relative ease of learning 6 elemental types of 2-way classifications have been deeply influential 2 times over: 1st, as a rebuke to pure stimulus generalization accounts, and again as the leading benchmark for evaluating formal models of human category learning. The litmus test for models…
A formative evaluation of CU-SeeMe
NASA Astrophysics Data System (ADS)
Bibeau, Michael
1995-02-01
CU-SeeMe is a video conferencing software package that was designed and programmed at Cornell University. The program works with the TCP/IP network protocol and allows two or more parties to conduct a real-time video conference with full audio support. In this paper we evaluate CU-SeeMe through the process of Formative Evaluation. We first perform a Critical Review of the software using a subset of the Smith and Mosier Guidelines for Human-Computer Interaction. Next, we empirically review the software interface through a series of benchmark tests that are derived directly from a set of scenarios. The scenarios attempt to model real world situations that might be encountered by an individual in the target user class. Designing benchmark tasks becomes a natural and straightforward process when they are derived from the scenario set. Empirical measures are taken for each task, including completion times and error counts. These measures are accompanied by critical incident analysis 2 7 13 which serves to identify problems with the interface and the cognitive roots of those problems. The critical incidents reported by participants are accompanied by explanations of what caused the problem and why This helps in the process of formulating solutions for observed usability problems. All the testing results are combined in the Appendix in an illustrated partial redesign of the CU-SeeMe Interface.
ERIC Educational Resources Information Center
Self-Brown, Shannon; Valente, Jessica R.; Wild, Robert C.; Whitaker, Daniel J.; Galanter, Rachel; Dorsey, Shannon; Stanley, Jenelle
2012-01-01
Benchmarking is a program evaluation approach that can be used to study whether the outcomes of parents/children who participate in an evidence-based program in the community approximate the outcomes found in randomized trials. This paper presents a case illustration using benchmarking methodology to examine a community implementation of…
Issues in Benchmark Metric Selection
NASA Astrophysics Data System (ADS)
Crolotte, Alain
It is true that a metric can influence a benchmark but will esoteric metrics create more problems than they will solve? We answer this question affirmatively by examining the case of the TPC-D metric which used the much debated geometric mean for the single-stream test. We will show how a simple choice influenced the benchmark and its conduct and, to some extent, DBMS development. After examining other alternatives our conclusion is that the “real” measure for a decision-support benchmark is the arithmetic mean.
Lutz, Jesse J; Duan, Xiaofeng F; Ranasinghe, Duminda S; Jin, Yifan; Margraf, Johannes T; Perera, Ajith; Burggraf, Larry W; Bartlett, Rodney J
2018-05-07
Accurate optical characterization of the closo-Si 12 C 12 molecule is important to guide experimental efforts toward the synthesis of nano-wires, cyclic nano-arrays, and related array structures, which are anticipated to be robust and efficient exciton materials for opto-electronic devices. Working toward calibrated methods for the description of closo-Si 12 C 12 oligomers, various electronic structure approaches are evaluated for their ability to reproduce measured optical transitions of the SiC 2 , Si 2 C n (n = 1-3), and Si 3 C n (n = 1, 2) clusters reported earlier by Steglich and Maier [Astrophys. J. 801, 119 (2015)]. Complete-basis-limit equation-of-motion coupled-cluster (EOMCC) results are presented and a comparison is made between perturbative and renormalized non-iterative triples corrections. The effect of adding a renormalized correction for quadruples is also tested. Benchmark test sets derived from both measurement and high-level EOMCC calculations are then used to evaluate the performance of a variety of density functionals within the time-dependent density functional theory (TD-DFT) framework. The best-performing functionals are subsequently applied to predict valence TD-DFT excitation energies for the lowest-energy isomers of Si n C and Si n-1 C 7-n (n = 4-6). TD-DFT approaches are then applied to the Si n C n (n = 4-12) clusters and unique spectroscopic signatures of closo-Si 12 C 12 are discussed. Finally, various long-range corrected density functionals, including those from the CAM-QTP family, are applied to a charge-transfer excitation in a cyclic (Si 4 C 4 ) 4 oligomer. Approaches for gauging the extent of charge-transfer character are also tested and EOMCC results are used to benchmark functionals and make recommendations.
Benchmark Lisp And Ada Programs
NASA Technical Reports Server (NTRS)
Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.
1992-01-01
Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alan Black; Arnis Judzis
2005-09-30
This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2004 through September 2005. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all Phase 1 testing and is planning Phase 2 development.« less
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Assessment of composite motif discovery methods.
Klepper, Kjetil; Sandve, Geir K; Abul, Osman; Johansen, Jostein; Drablos, Finn
2008-02-26
Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery - discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.
FDA Benchmark Medical Device Flow Models for CFD Validation.
Malinauskas, Richard A; Hariharan, Prasanna; Day, Steven W; Herbertson, Luke H; Buesen, Martin; Steinseifer, Ulrich; Aycock, Kenneth I; Good, Bryan C; Deutsch, Steven; Manning, Keefe B; Craven, Brent A
Computational fluid dynamics (CFD) is increasingly being used to develop blood-contacting medical devices. However, the lack of standardized methods for validating CFD simulations and blood damage predictions limits its use in the safety evaluation of devices. Through a U.S. Food and Drug Administration (FDA) initiative, two benchmark models of typical device flow geometries (nozzle and centrifugal blood pump) were tested in multiple laboratories to provide experimental velocities, pressures, and hemolysis data to support CFD validation. In addition, computational simulations were performed by more than 20 independent groups to assess current CFD techniques. The primary goal of this article is to summarize the FDA initiative and to report recent findings from the benchmark blood pump model study. Discrepancies between CFD predicted velocities and those measured using particle image velocimetry most often occurred in regions of flow separation (e.g., downstream of the nozzle throat, and in the pump exit diffuser). For the six pump test conditions, 57% of the CFD predictions of pressure head were within one standard deviation of the mean measured values. Notably, only 37% of all CFD submissions contained hemolysis predictions. This project aided in the development of an FDA Guidance Document on factors to consider when reporting computational studies in medical device regulatory submissions. There is an accompanying podcast available for this article. Please visit the journal's Web site (www.asaiojournal.com) to listen.
Piloting a Process Maturity Model as an e-Learning Benchmarking Method
ERIC Educational Resources Information Center
Petch, Jim; Calverley, Gayle; Dexter, Hilary; Cappelli, Tim
2007-01-01
As part of a national e-learning benchmarking initiative of the UK Higher Education Academy, the University of Manchester is carrying out a pilot study of a method to benchmark e-learning in an institution. The pilot was designed to evaluate the operational viability of a method based on the e-Learning Maturity Model developed at the University of…
Developing a benchmark for emotional analysis of music
Yang, Yi-Hsuan; Soleymani, Mohammad
2017-01-01
Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the ‘Emotion in Music’ task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER. PMID:28282400
Decoys Selection in Benchmarking Datasets: Overview and Perspectives
Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu
2018-01-01
Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
Developing a benchmark for emotional analysis of music.
Aljanaki, Anna; Yang, Yi-Hsuan; Soleymani, Mohammad
2017-01-01
Music emotion recognition (MER) field rapidly expanded in the last decade. Many new methods and new audio features are developed to improve the performance of MER algorithms. However, it is very difficult to compare the performance of the new methods because of the data representation diversity and scarcity of publicly available data. In this paper, we address these problems by creating a data set and a benchmark for MER. The data set that we release, a MediaEval Database for Emotional Analysis in Music (DEAM), is the largest available data set of dynamic annotations (valence and arousal annotations for 1,802 songs and song excerpts licensed under Creative Commons with 2Hz time resolution). Using DEAM, we organized the 'Emotion in Music' task at MediaEval Multimedia Evaluation Campaign from 2013 to 2015. The benchmark attracted, in total, 21 active teams to participate in the challenge. We analyze the results of the benchmark: the winning algorithms and feature-sets. We also describe the design of the benchmark, the evaluation procedures and the data cleaning and transformations that we suggest. The results from the benchmark suggest that the recurrent neural network based approaches combined with large feature-sets work best for dynamic MER.
Simplified Numerical Analysis of ECT Probe - Eddy Current Benchmark Problem 3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sikora, R.; Chady, T.; Gratkowski, S.
2005-04-09
In this paper a third eddy current benchmark problem is considered. The objective of the benchmark is to determine optimal operating frequency and size of the pancake coil designated for testing tubes made of Inconel. It can be achieved by maximization of the change in impedance of the coil due to a flaw. Approximation functions of the probe (coil) characteristic were developed and used in order to reduce number of required calculations. It results in significant speed up of the optimization process. An optimal testing frequency and size of the probe were achieved as a final result of the calculation.
Benchmarking Using Basic DBMS Operations
NASA Astrophysics Data System (ADS)
Crolotte, Alain; Ghazal, Ahmad
The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Kalpathy-Cramer, Jayashree; de Herrera, Alba García Seco; Demner-Fushman, Dina; Antani, Sameer; Bedrick, Steven; Müller, Henning
2014-01-01
Medical image retrieval and classification have been extremely active research topics over the past 15 years. With the ImageCLEF benchmark in medical image retrieval and classification a standard test bed was created that allows researchers to compare their approaches and ideas on increasingly large and varied data sets including generated ground truth. This article describes the lessons learned in ten evaluations campaigns. A detailed analysis of the data also highlights the value of the resources created. PMID:24746250
Benchmarks for target tracking
NASA Astrophysics Data System (ADS)
Dunham, Darin T.; West, Philip D.
2011-09-01
The term benchmark originates from the chiseled horizontal marks that surveyors made, into which an angle-iron could be placed to bracket ("bench") a leveling rod, thus ensuring that the leveling rod can be repositioned in exactly the same place in the future. A benchmark in computer terms is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object by running a number of standard tests and trials against it. This paper will discuss the history of simulation benchmarks that are being used by multiple branches of the military and agencies of the US government. These benchmarks range from missile defense applications to chemical biological situations. Typically, a benchmark is used with Monte Carlo runs in order to tease out how algorithms deal with variability and the range of possible inputs. We will also describe problems that can be solved by a benchmark.
DE-NE0008277_PROTEUS final technical report 2018
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enqvist, Andreas
This project details re-evaluations of experiments of gas-cooled fast reactor (GCFR) core designs performed in the 1970s at the PROTEUS reactor and create a series of International Reactor Physics Experiment Evaluation Project (IRPhEP) benchmarks. Currently there are no gas-cooled fast reactor (GCFR) experiments available in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook). These experiments are excellent candidates for reanalysis and development of multiple benchmarks because these experiments provide high-quality integral nuclear data relevant to the validation and refinement of thorium, neptunium, uranium, plutonium, iron, and graphite cross sections. It would be cost prohibitive to reproduce suchmore » a comprehensive suite of experimental data to support any future GCFR endeavors.« less
A benchmark for fault tolerant flight control evaluation
NASA Astrophysics Data System (ADS)
Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.
2013-12-01
A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.
Rapid Model Fabrication and Testing for Aerospace Vehicles
NASA Technical Reports Server (NTRS)
Buck, Gregory M.
2000-01-01
Advanced methods for rapid fabrication and instrumentation of hypersonic wind tunnel models are being developed and evaluated at NASA Langley Research Center. Rapid aeroheating model fabrication and measurement techniques using investment casting of ceramic test models and thermographic phosphors are reviewed. More accurate model casting techniques for fabrication of benchmark metal and ceramic test models are being developed using a combination of rapid prototype patterns and investment casting. White light optical scanning is used for coordinate measurements to evaluate the fabrication process and verify model accuracy to +/- 0.002 inches. Higher-temperature (<210C) luminescent coatings are also being developed for simultaneous pressure and temperature mapping, providing global pressure as well as global aeroheating measurements. Together these techniques will provide a more rapid and complete experimental aerodynamic and aerothermodynamic database for future aerospace vehicles.
2015-01-01
Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon
2014-05-27
Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
Generation of openEHR Test Datasets for Benchmarking.
El Helou, Samar; Karvonen, Tuukka; Yamamoto, Goshiro; Kume, Naoto; Kobayashi, Shinji; Kondo, Eiji; Hiragi, Shusuke; Okamoto, Kazuya; Tamura, Hiroshi; Kuroda, Tomohiro
2017-01-01
openEHR is a widely used EHR specification. Given its technology-independent nature, different approaches for implementing openEHR data repositories exist. Public openEHR datasets are needed to conduct benchmark analyses over different implementations. To address their current unavailability, we propose a method for generating openEHR test datasets that can be publicly shared and used.
Is Higher Better? Determinants and Comparisons of Performance on the Major Field Test in Business
ERIC Educational Resources Information Center
Bielinska-Kwapisz, Agnieszka; Brown, F. William; Semenik, Richard
2012-01-01
Student performance on the Major Field Achievement Test in Business is an important benchmark for college of business programs. The authors' results indicate that such benchmarking can only be meaningful if certain student characteristics are taken into account. The differences in achievement between cohorts are explored in detail by separating…
But What Do You Do with the Data?
ERIC Educational Resources Information Center
Matthews, Jan; Trimble, Susan; Gay, Anne
2007-01-01
Using data to redesign instruction is a means of increasing student achievement. Educators in Camden County (Georgia) Schools have used data from benchmark testing since 1999. They hired a commercial vendor to design a benchmark test that is administered four times a year and use the data to generate subject-area reports that can be further…
Experimental Data from the Benchmark SuperCritical Wing Wind Tunnel Test on an Oscillating Turntable
NASA Technical Reports Server (NTRS)
Heeg, Jennifer; Piatak, David J.
2013-01-01
The Benchmark SuperCritical Wing (BSCW) wind tunnel model served as a semi-blind testcase for the 2012 AIAA Aeroelastic Prediction Workshop (AePW). The BSCW was chosen as a testcase due to its geometric simplicity and flow physics complexity. The data sets examined include unforced system information and forced pitching oscillations. The aerodynamic challenges presented by this AePW testcase include a strong shock that was observed to be unsteady for even the unforced system cases, shock-induced separation and trailing edge separation. The current paper quantifies these characteristics at the AePW test condition and at a suggested benchmarking test condition. General characteristics of the model's behavior are examined for the entire available data set.
Performance Evaluation and Benchmarking of Intelligent Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madhavan, Raj; Messina, Elena; Tunstel, Edward
To design and develop capable, dependable, and affordable intelligent systems, their performance must be measurable. Scientific methodologies for standardization and benchmarking are crucial for quantitatively evaluating the performance of emerging robotic and intelligent systems technologies. There is currently no accepted standard for quantitatively measuring the performance of these systems against user-defined requirements; and furthermore, there is no consensus on what objective evaluation procedures need to be followed to understand the performance of these systems. The lack of reproducible and repeatable test methods has precluded researchers working towards a common goal from exchanging and communicating results, inter-comparing system performance, and leveragingmore » previous work that could otherwise avoid duplication and expedite technology transfer. Currently, this lack of cohesion in the community hinders progress in many domains, such as manufacturing, service, healthcare, and security. By providing the research community with access to standardized tools, reference data sets, and open source libraries of solutions, researchers and consumers will be able to evaluate the cost and benefits associated with intelligent systems and associated technologies. In this vein, the edited book volume addresses performance evaluation and metrics for intelligent systems, in general, while emphasizing the need and solutions for standardized methods. To the knowledge of the editors, there is not a single book on the market that is solely dedicated to the subject of performance evaluation and benchmarking of intelligent systems. Even books that address this topic do so only marginally or are out of date. The research work presented in this volume fills this void by drawing from the experiences and insights of experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. The book presents a detailed and coherent picture of state-of-the-art, recent developments, and further research areas in intelligent systems.« less
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal
The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark application.« less
Liebe, J D; Hübner, U
2013-01-01
Continuous improvements of IT-performance in healthcare organisations require actionable performance indicators, regularly conducted, independent measurements and meaningful and scalable reference groups. Existing IT-benchmarking initiatives have focussed on the development of reliable and valid indicators, but less on the questions about how to implement an environment for conducting easily repeatable and scalable IT-benchmarks. This study aims at developing and trialling a procedure that meets the afore-mentioned requirements. We chose a well established, regularly conducted (inter-) national IT-survey of healthcare organisations (IT-Report Healthcare) as the environment and offered the participants of the 2011 survey (CIOs of hospitals) to enter a benchmark. The 61 structural and functional performance indicators covered among others the implementation status and integration of IT-systems and functions, global user satisfaction and the resources of the IT-department. Healthcare organisations were grouped by size and ownership. The benchmark results were made available electronically and feedback on the use of these results was requested after several months. Fifty-ninehospitals participated in the benchmarking. Reference groups consisted of up to 141 members depending on the number of beds (size) and the ownership (public vs. private). A total of 122 charts showing single indicator frequency views were sent to each participant. The evaluation showed that 94.1% of the CIOs who participated in the evaluation considered this benchmarking beneficial and reported that they would enter again. Based on the feedback of the participants we developed two additional views that provide a more consolidated picture. The results demonstrate that establishing an independent, easily repeatable and scalable IT-benchmarking procedure is possible and was deemed desirable. Based on these encouraging results a new benchmarking round which includes process indicators is currently conducted.
PMLB: a large benchmark suite for machine learning evaluation and comparison.
Olson, Randal S; La Cava, William; Orzechowski, Patryk; Urbanowicz, Ryan J; Moore, Jason H
2017-01-01
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered. This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
Highly Enriched Uranium Metal Cylinders Surrounded by Various Reflector Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernard Jones; J. Blair Briggs; Leland Monteirth
A series of experiments was performed at Los Alamos Scientific Laboratory in 1958 to determine critical masses of cylinders of Oralloy (Oy) reflected by a number of materials. The experiments were all performed on the Comet Universal Critical Assembly Machine, and consisted of discs of highly enriched uranium (93.3 wt.% 235U) reflected by half-inch and one-inch-thick cylindrical shells of various reflector materials. The experiments were performed by members of Group N-2, particularly K. W. Gallup, G. E. Hansen, H. C. Paxton, and R. H. White. This experiment was intended to ascertain critical masses for criticality safety purposes, as well asmore » to compare neutron transport cross sections to those obtained from danger coefficient measurements with the Topsy Oralloy-Tuballoy reflected and Godiva unreflected critical assemblies. The reflector materials examined in this series of experiments are as follows: magnesium, titanium, aluminum, graphite, mild steel, nickel, copper, cobalt, molybdenum, natural uranium, tungsten, beryllium, aluminum oxide, molybdenum carbide, and polythene (polyethylene). Also included are two special configurations of composite beryllium and iron reflectors. Analyses were performed in which uncertainty associated with six different parameters was evaluated; namely, extrapolation to the uranium critical mass, uranium density, 235U enrichment, reflector density, reflector thickness, and reflector impurities. In addition to the idealizations made by the experimenters (removal of the platen and diaphragm), two simplifications were also made to the benchmark models that resulted in a small bias and additional uncertainty. First of all, since impurities in core and reflector materials are only estimated, they are not included in the benchmark models. Secondly, the room, support structure, and other possible surrounding equipment were not included in the model. Bias values that result from these two simplifications were determined and associated uncertainty in the bias values were included in the overall uncertainty in benchmark keff values. Bias values were very small, ranging from 0.0004 ?k low to 0.0007 ?k low. Overall uncertainties range from ? 0.0018 to ? 0.0030. Major contributors to the overall uncertainty include uncertainty in the extrapolation to the uranium critical mass and the uranium density. Results are summarized in Figure 1. Figure 1. Experimental, Benchmark-Model, and MCNP/KENO Calculated Results The 32 configurations described and evaluated under ICSBEP Identifier HEU-MET-FAST-084 are judged to be acceptable for use as criticality safety benchmark experiments and should be valuable integral benchmarks for nuclear data testing of the various reflector materials. Details of the benchmark models, uncertainty analyses, and final results are given in this paper.« less
Scale/TSUNAMI Sensitivity Data for ICSBEP Evaluations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rearden, Bradley T; Reed, Davis Allan; Lefebvre, Robert A
2011-01-01
The Tools for Sensitivity and Uncertainty Analysis Methodology Implementation (TSUNAMI) software developed at Oak Ridge National Laboratory (ORNL) as part of the Scale code system provide unique methods for code validation, gap analysis, and experiment design. For TSUNAMI analysis, sensitivity data are generated for each application and each existing or proposed experiment used in the assessment. The validation of diverse sets of applications requires potentially thousands of data files to be maintained and organized by the user, and a growing number of these files are available through the International Handbook of Evaluated Criticality Safety Benchmark Experiments (IHECSBE) distributed through themore » International Criticality Safety Benchmark Evaluation Program (ICSBEP). To facilitate the use of the IHECSBE benchmarks in rigorous TSUNAMI validation and gap analysis techniques, ORNL generated SCALE/TSUNAMI sensitivity data files (SDFs) for several hundred benchmarks for distribution with the IHECSBE. For the 2010 edition of IHECSBE, the sensitivity data were generated using 238-group cross-section data based on ENDF/B-VII.0 for 494 benchmark experiments. Additionally, ORNL has developed a quality assurance procedure to guide the generation of Scale inputs and sensitivity data, as well as a graphical user interface to facilitate the use of sensitivity data in identifying experiments and applying them in validation studies.« less
WWTP dynamic disturbance modelling--an essential module for long-term benchmarking development.
Gernaey, K V; Rosen, C; Jeppsson, U
2006-01-01
Intensive use of the benchmark simulation model No. 1 (BSM1), a protocol for objective comparison of the effectiveness of control strategies in biological nitrogen removal activated sludge plants, has also revealed a number of limitations. Preliminary definitions of the long-term benchmark simulation model No. 1 (BSM1_LT) and the benchmark simulation model No. 2 (BSM2) have been made to extend BSM1 for evaluation of process monitoring methods and plant-wide control strategies, respectively. Influent-related disturbances for BSM1_LT/BSM2 are to be generated with a model, and this paper provides a general overview of the modelling methods used. Typical influent dynamic phenomena generated with the BSM1_LT/BSM2 influent disturbance model, including diurnal, weekend, seasonal and holiday effects, as well as rainfall, are illustrated with simulation results. As a result of the work described in this paper, a proposed influent model/file has been released to the benchmark developers for evaluation purposes. Pending this evaluation, a final BSM1_LT/BSM2 influent disturbance model definition is foreseen. Preliminary simulations with dynamic influent data generated by the influent disturbance model indicate that default BSM1 activated sludge plant control strategies will need extensions for BSM1_LT/BSM2 to efficiently handle 1 year of influent dynamics.
Cereda, Carlo W; Christensen, Søren; Campbell, Bruce Cv; Mishra, Nishant K; Mlynash, Michael; Levi, Christopher; Straka, Matus; Wintermark, Max; Bammer, Roland; Albers, Gregory W; Parsons, Mark W; Lansberg, Maarten G
2016-10-01
Differences in research methodology have hampered the optimization of Computer Tomography Perfusion (CTP) for identification of the ischemic core. We aim to optimize CTP core identification using a novel benchmarking tool. The benchmarking tool consists of an imaging library and a statistical analysis algorithm to evaluate the performance of CTP. The tool was used to optimize and evaluate an in-house developed CTP-software algorithm. Imaging data of 103 acute stroke patients were included in the benchmarking tool. Median time from stroke onset to CT was 185 min (IQR 180-238), and the median time between completion of CT and start of MRI was 36 min (IQR 25-79). Volumetric accuracy of the CTP-ROIs was optimal at an rCBF threshold of <38%; at this threshold, the mean difference was 0.3 ml (SD 19.8 ml), the mean absolute difference was 14.3 (SD 13.7) ml, and CTP was 67% sensitive and 87% specific for identification of DWI positive tissue voxels. The benchmarking tool can play an important role in optimizing CTP software as it provides investigators with a novel method to directly compare the performance of alternative CTP software packages. © The Author(s) 2015.
NASA Technical Reports Server (NTRS)
Padovan, J.; Adams, M.; Lam, P.; Fertis, D.; Zeid, I.
1982-01-01
Second-year efforts within a three-year study to develop and extend finite element (FE) methodology to efficiently handle the transient/steady state response of rotor-bearing-stator structure associated with gas turbine engines are outlined. The two main areas aim at (1) implanting the squeeze film damper element into a general purpose FE code for testing and evaluation; and (2) determining the numerical characteristics of the FE-generated rotor-bearing-stator simulation scheme. The governing FE field equations are set out and the solution methodology is presented. The choice of ADINA as the general-purpose FE code is explained, and the numerical operational characteristics of the direct integration approach of FE-generated rotor-bearing-stator simulations is determined, including benchmarking, comparison of explicit vs. implicit methodologies of direct integration, and demonstration problems.
Performance Evaluation and Benchmarking of Next Intelligent Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
del Pobil, Angel; Madhavan, Raj; Bonsignorio, Fabio
Performance Evaluation and Benchmarking of Intelligent Systems presents research dedicated to the subject of performance evaluation and benchmarking of intelligent systems by drawing from the experiences and insights of leading experts gained both through theoretical development and practical implementation of intelligent systems in a variety of diverse application domains. This contributed volume offers a detailed and coherent picture of state-of-the-art, recent developments, and further research areas in intelligent systems. The chapters cover a broad range of applications, such as assistive robotics, planetary surveying, urban search and rescue, and line tracking for automotive assembly. Subsystems or components described in this bookmore » include human-robot interaction, multi-robot coordination, communications, perception, and mapping. Chapters are also devoted to simulation support and open source software for cognitive platforms, providing examples of the type of enabling underlying technologies that can help intelligent systems to propagate and increase in capabilities. Performance Evaluation and Benchmarking of Intelligent Systems serves as a professional reference for researchers and practitioners in the field. This book is also applicable to advanced courses for graduate level students and robotics professionals in a wide range of engineering and related disciplines including computer science, automotive, healthcare, manufacturing, and service robotics.« less
Metric Evaluation Pipeline for 3d Modeling of Urban Scenes
NASA Astrophysics Data System (ADS)
Bosch, M.; Leichtman, A.; Chilcott, D.; Goldberg, H.; Brown, M.
2017-05-01
Publicly available benchmark data and metric evaluation approaches have been instrumental in enabling research to advance state of the art methods for remote sensing applications in urban 3D modeling. Most publicly available benchmark datasets have consisted of high resolution airborne imagery and lidar suitable for 3D modeling on a relatively modest scale. To enable research in larger scale 3D mapping, we have recently released a public benchmark dataset with multi-view commercial satellite imagery and metrics to compare 3D point clouds with lidar ground truth. We now define a more complete metric evaluation pipeline developed as publicly available open source software to assess semantically labeled 3D models of complex urban scenes derived from multi-view commercial satellite imagery. Evaluation metrics in our pipeline include horizontal and vertical accuracy and completeness, volumetric completeness and correctness, perceptual quality, and model simplicity. Sources of ground truth include airborne lidar and overhead imagery, and we demonstrate a semi-automated process for producing accurate ground truth shape files to characterize building footprints. We validate our current metric evaluation pipeline using 3D models produced using open source multi-view stereo methods. Data and software is made publicly available to enable further research and planned benchmarking activities.
SP2Bench: A SPARQL Performance Benchmark
NASA Astrophysics Data System (ADS)
Schmidt, Michael; Hornung, Thomas; Meier, Michael; Pinkel, Christoph; Lausen, Georg
A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
van Lent, Wineke A M; de Beer, Relinde D; van Harten, Wim H
2010-08-31
Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations.Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals.
2010-01-01
Background Benchmarking is one of the methods used in business that is applied to hospitals to improve the management of their operations. International comparison between hospitals can explain performance differences. As there is a trend towards specialization of hospitals, this study examines the benchmarking process and the success factors of benchmarking in international specialized cancer centres. Methods Three independent international benchmarking studies on operations management in cancer centres were conducted. The first study included three comprehensive cancer centres (CCC), three chemotherapy day units (CDU) were involved in the second study and four radiotherapy departments were included in the final study. Per multiple case study a research protocol was used to structure the benchmarking process. After reviewing the multiple case studies, the resulting description was used to study the research objectives. Results We adapted and evaluated existing benchmarking processes through formalizing stakeholder involvement and verifying the comparability of the partners. We also devised a framework to structure the indicators to produce a coherent indicator set and better improvement suggestions. Evaluating the feasibility of benchmarking as a tool to improve hospital processes led to mixed results. Case study 1 resulted in general recommendations for the organizations involved. In case study 2, the combination of benchmarking and lean management led in one CDU to a 24% increase in bed utilization and a 12% increase in productivity. Three radiotherapy departments of case study 3, were considering implementing the recommendations. Additionally, success factors, such as a well-defined and small project scope, partner selection based on clear criteria, stakeholder involvement, simple and well-structured indicators, analysis of both the process and its results and, adapt the identified better working methods to the own setting, were found. Conclusions The improved benchmarking process and the success factors can produce relevant input to improve the operations management of specialty hospitals. PMID:20807408
ERIC Educational Resources Information Center
Lin, Sheau-Wen; Liu, Yu; Chen, Shin-Feng; Wang, Jing-Ru; Kao, Huey-Lien
2016-01-01
The purpose of this study was to develop a computer-based measure of elementary students' science talk and to report students' benchmarks. The development procedure had three steps: defining the framework of the test, collecting and identifying key reference sets of science talk, and developing and verifying the science talk instrument. The…
Kocha, Shyam S.; Shinozaki, Kazuma; Zack, Jason W.; ...
2017-05-02
Thin-film-rotating disk electrodes (TF-RDEs) are the half-cell electrochemical system of choice for rapid screening of oxygen reduction reaction (ORR) activity of novel Pt supported on carbon black supports (Pt/C) electrocatalysts. It has been shown that the magnitude of the measured ORR activity and reproducibility are highly dependent on the system cleanliness, evaluation protocols, and operating conditions as well as ink formulation, composition, film drying, and the resultant film thickness and uniformity. Accurate benchmarks of baseline Pt/C catalysts evaluated using standardized protocols and best practices are necessary to expedite ultra-low-platinum group metal (PGM) catalyst development that is crucial for the imminentmore » commercialization of fuel cell vehicles. We report results of evaluation in three independent laboratories of Pt/C electrocatalysts provided by commercial fuel cell catalyst manufacturers (Johnson Matthey, Umicore, Tanaka Kikinzoku Kogyo - TKK). The studies were conducted using identical evaluation protocols/ink formulation/film fabrication albeit employing unique electrochemical cell designs specific to each laboratory. Furthermore, the ORR activities reported in this work provide a baseline and criteria for selection and scale-up of novel high activity ORR electrocatalysts for implementation in proton exchange membrane fuel cells (PEMFCs).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kocha, Shyam S.; Shinozaki, Kazuma; Zack, Jason W.
Abstract Thin-film-rotating disk electrodes (TF-RDEs) are the half-cell electrochemical system of choice for rapid screening of oxygen reduction reaction (ORR) activity of novel Pt supported on carbon black supports (Pt/C) electrocatalysts. It has been shown that the magnitude of the measured ORR activity and reproducibility are highly dependent on the system cleanliness, evaluation protocols, and operating conditions as well as ink formulation, composition, film drying, and the resultant film thickness and uniformity. Accurate benchmarks of baseline Pt/C catalysts evaluated using standardized protocols and best practices are necessary to expedite ultra-low-platinum group metal (PGM) catalyst development that is crucial for themore » imminent commercialization of fuel cell vehicles. We report results of evaluation in three independent laboratories of Pt/C electrocatalysts provided by commercial fuel cell catalyst manufacturers (Johnson Matthey, Umicore, Tanaka Kikinzoku Kogyo—TKK). The studies were conducted using identical evaluation protocols/ink formulation/film fabrication albeit employing unique electrochemical cell designs specific to each laboratory. The ORR activities reported in this work provide a baseline and criteria for selection and scale-up of novel high activity ORR electrocatalysts for implementation in proton exchange membrane fuel cells (PEMFCs).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kocha, Shyam S.; Shinozaki, Kazuma; Zack, Jason W.
Thin-film-rotating disk electrodes (TF-RDEs) are the half-cell electrochemical system of choice for rapid screening of oxygen reduction reaction (ORR) activity of novel Pt supported on carbon black supports (Pt/C) electrocatalysts. It has been shown that the magnitude of the measured ORR activity and reproducibility are highly dependent on the system cleanliness, evaluation protocols, and operating conditions as well as ink formulation, composition, film drying, and the resultant film thickness and uniformity. Accurate benchmarks of baseline Pt/C catalysts evaluated using standardized protocols and best practices are necessary to expedite ultra-low-platinum group metal (PGM) catalyst development that is crucial for the imminentmore » commercialization of fuel cell vehicles. We report results of evaluation in three independent laboratories of Pt/C electrocatalysts provided by commercial fuel cell catalyst manufacturers (Johnson Matthey, Umicore, Tanaka Kikinzoku Kogyo - TKK). The studies were conducted using identical evaluation protocols/ink formulation/film fabrication albeit employing unique electrochemical cell designs specific to each laboratory. Furthermore, the ORR activities reported in this work provide a baseline and criteria for selection and scale-up of novel high activity ORR electrocatalysts for implementation in proton exchange membrane fuel cells (PEMFCs).« less
Bauer, Matthias R; Ibrahim, Tamer M; Vogel, Simon M; Boeckler, Frank M
2013-06-24
The application of molecular benchmarking sets helps to assess the actual performance of virtual screening (VS) workflows. To improve the efficiency of structure-based VS approaches, the selection and optimization of various parameters can be guided by benchmarking. With the DEKOIS 2.0 library, we aim to further extend and complement the collection of publicly available decoy sets. Based on BindingDB bioactivity data, we provide 81 new and structurally diverse benchmark sets for a wide variety of different target classes. To ensure a meaningful selection of ligands, we address several issues that can be found in bioactivity data. We have improved our previously introduced DEKOIS methodology with enhanced physicochemical matching, now including the consideration of molecular charges, as well as a more sophisticated elimination of latent actives in the decoy set (LADS). We evaluate the docking performance of Glide, GOLD, and AutoDock Vina with our data sets and highlight existing challenges for VS tools. All DEKOIS 2.0 benchmark sets will be made accessible at http://www.dekois.com.
Providing Nuclear Criticality Safety Analysis Education through Benchmark Experiment Evaluation
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess; J. Blair Briggs; David W. Nigg
2009-11-01
One of the challenges that today's new workforce of nuclear criticality safety engineers face is the opportunity to provide assessment of nuclear systems and establish safety guidelines without having received significant experience or hands-on training prior to graduation. Participation in the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and/or the International Reactor Physics Experiment Evaluation Project (IRPhEP) provides students and young professionals the opportunity to gain experience and enhance critical engineering skills.
Propulsion Diagnostic Method Evaluation Strategy (ProDiMES) User's Guide
NASA Technical Reports Server (NTRS)
Simon, Donald L.
2010-01-01
This report is a User's Guide for the Propulsion Diagnostic Method Evaluation Strategy (ProDiMES). ProDiMES is a standard benchmarking problem and a set of evaluation metrics to enable the comparison of candidate aircraft engine gas path diagnostic methods. This Matlab (The Mathworks, Inc.) based software tool enables users to independently develop and evaluate diagnostic methods. Additionally, a set of blind test case data is also distributed as part of the software. This will enable the side-by-side comparison of diagnostic approaches developed by multiple users. The Users Guide describes the various components of ProDiMES, and provides instructions for the installation and operation of the tool.
Evaluation of the ACEC Benchmark Suite for Real-Time Applications
1990-07-23
1.0 benchmark suite waSanalyzed with respect to its measuring of Ada real-time features such as tasking, memory management, input/output, scheduling...and delay statement, Chapter 13 features , pragmas, interrupt handling, subprogram overhead, numeric computations etc. For most of the features that...meant for programming real-time systems. The ACEC benchmarks have been analyzed extensively with respect to their measuring of Ada real-time features
An automated protocol for performance benchmarking a widefield fluorescence microscope.
Halter, Michael; Bier, Elianna; DeRose, Paul C; Cooksey, Gregory A; Choquette, Steven J; Plant, Anne L; Elliott, John T
2014-11-01
Widefield fluorescence microscopy is a highly used tool for visually assessing biological samples and for quantifying cell responses. Despite its widespread use in high content analysis and other imaging applications, few published methods exist for evaluating and benchmarking the analytical performance of a microscope. Easy-to-use benchmarking methods would facilitate the use of fluorescence imaging as a quantitative analytical tool in research applications, and would aid the determination of instrumental method validation for commercial product development applications. We describe and evaluate an automated method to characterize a fluorescence imaging system's performance by benchmarking the detection threshold, saturation, and linear dynamic range to a reference material. The benchmarking procedure is demonstrated using two different materials as the reference material, uranyl-ion-doped glass and Schott 475 GG filter glass. Both are suitable candidate reference materials that are homogeneously fluorescent and highly photostable, and the Schott 475 GG filter glass is currently commercially available. In addition to benchmarking the analytical performance, we also demonstrate that the reference materials provide for accurate day to day intensity calibration. Published 2014 Wiley Periodicals Inc. Published 2014 Wiley Periodicals Inc. This article is a US government work and, as such, is in the public domain in the United States of America.
Connor, Jean A; Larson, Carol; Baird, Jennifer; Hickey, Patricia A
2016-01-01
The evidence linking nursing care and patient outcomes has been globally demonstrated. Thus, it is time for translation and application of this evidence to robust measurement that uniquely demonstrates the value of nursing care and the characteristics of the nursing workforce that contribute to optimal patient outcomes. The aim of this study was to identify and develop standardized measures representative of pediatric nursing care of the cardiovascular patient for benchmarking within freestanding children's hospitals. Using a consensus-based approach, the Consortium of Congenital Cardiac Care- Measurement of Nursing Practice (C4-MNP) members developed quality measures within working groups and then individually critiqued all drafted measures. Final draft measures were then independently reviewed and critiqued by an external nursing quality measurement committee. The final quality measures were also made available to a national parent support group for feedback. The development process used by C4-MNP resulted in 10 measures eligible for testing across freestanding children's hospitals. Employing a collaborative consensus-based method plus implementing the criteria of the National Quality Forum and external vetting period provided a strong framework for the development and evaluation of standardized measures. The Consortium will continue with implementation and testing of each measure in 9 of our 28 collaborating centers. This activity will support initial development of benchmarks and evaluation of the association of the measures with patient outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.
Benchmarking an operational procedure for rapid flood mapping and risk assessment in Europe
NASA Astrophysics Data System (ADS)
Dottori, Francesco; Salamon, Peter; Kalas, Milan; Bianchi, Alessandra; Feyen, Luc
2016-04-01
The development of real-time methods for rapid flood mapping and risk assessment is crucial to improve emergency response and mitigate flood impacts. This work describes the benchmarking of an operational procedure for rapid flood risk assessment based on the flood predictions issued by the European Flood Awareness System (EFAS). The daily forecasts produced for the major European river networks are translated into event-based flood hazard maps using a large map catalogue derived from high-resolution hydrodynamic simulations, based on the hydro-meteorological dataset of EFAS. Flood hazard maps are then combined with exposure and vulnerability information, and the impacts of the forecasted flood events are evaluated in near real-time in terms of flood prone areas, potential economic damage, affected population, infrastructures and cities. An extensive testing of the operational procedure is carried out using the catastrophic floods of May 2014 in Bosnia-Herzegovina, Croatia and Serbia. The reliability of the flood mapping methodology is tested against satellite-derived flood footprints, while ground-based estimations of economic damage and affected population is compared against modelled estimates. We evaluated the skill of flood hazard and risk estimations derived from EFAS flood forecasts with different lead times and combinations. The assessment includes a comparison of several alternative approaches to produce and present the information content, in order to meet the requests of EFAS users. The tests provided good results and showed the potential of the developed real-time operational procedure in helping emergency response and management.
Willemse, Elias J; Joubert, Johan W
2016-09-01
In this article we present benchmark datasets for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities (MCARPTIF). The problem is a generalisation of the Capacitated Arc Routing Problem (CARP), and closely represents waste collection routing. Four different test sets are presented, each consisting of multiple instance files, and which can be used to benchmark different solution approaches for the MCARPTIF. An in-depth description of the datasets can be found in "Constructive heuristics for the Mixed Capacity Arc Routing Problem under Time Restrictions with Intermediate Facilities" (Willemseand Joubert, 2016) [2] and "Splitting procedures for the Mixed Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemseand Joubert, in press) [4]. The datasets are publicly available from "Library of benchmark test sets for variants of the Capacitated Arc Routing Problem under Time restrictions with Intermediate Facilities" (Willemse and Joubert, 2016) [3].
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neary, Vincent Sinclair; Yang, Zhaoqing; Wang, Taiping
A wave model test bed is established to benchmark, test and evaluate spectral wave models and modeling methodologies (i.e., best practices) for predicting the wave energy resource parameters recommended by the International Electrotechnical Commission, IEC TS 62600-101Ed. 1.0 ©2015. Among other benefits, the model test bed can be used to investigate the suitability of different models, specifically what source terms should be included in spectral wave models under different wave climate conditions and for different classes of resource assessment. The overarching goal is to use these investigations to provide industry guidance for model selection and modeling best practices depending onmore » the wave site conditions and desired class of resource assessment. Modeling best practices are reviewed, and limitations and knowledge gaps in predicting wave energy resource parameters are identified.« less
Benchmarking hypercube hardware and software
NASA Technical Reports Server (NTRS)
Grunwald, Dirk C.; Reed, Daniel A.
1986-01-01
It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns.
Stress Testing of the Philips 60W Replacement Lamp L Prize Entry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poplawski, Michael E.; Ledbetter, Marc R.; Smith, Mark
2012-04-24
The Pacific Northwest National Laboratory, operated by Battelle for the U.S. Department of Energy, worked with Intertek to develop a procedure for stress testing medium screw-base light sources. This procedure, composed of alternating stress cycles and performance evaluation, was used to qualitatively compare and contrast the durability and reliability of the Philips 60W replacement lamp L Prize entry with market-proven compact fluorescent lamps (CFLs) with comparable light output and functionality. The stress cycles applied simultaneous combinations of electrical, thermal, vibration, and humidity stresses of increasing magnitude. Performance evaluations measured relative illuminance, x chromaticity and y chromaticity shifts after each stressmore » cycle. The Philips L Prize entry lamps appear to be appreciably more durable than the incumbent energy-efficient technology, as represented by the evaluated CFLs, and with respect to the applied stresses. Through the course of testing, all 15 CFL samples permanently ceased to function as a result of the applied stresses, while only 1 Philips L Prize entry lamp exhibited a failure, the nature of which was minor, non-destructive, and a consequence of a known (and resolved) subcontractor issue. Given that current CFL technology appears to be moderately mature and no Philips L Prize entry failures could be produced within the stress envelope causing 100 percent failure of the benchmark CFLs, it seems that, in this particular implementation, light-emitting diode (LED) technology would be much more durable in the field than current CFL technology. However, the Philips L Prize entry lamps used for testing were carefully designed and built for the competition, while the benchmark CFLs were mass produced for retail sale—a distinction that should be taken into consideration. Further reliability testing on final production samples would be necessary to judge the extent to which the results of this analysis apply to production versions of the Philips L Prize entry.« less
NASA Astrophysics Data System (ADS)
Rohrer, Brandon
2010-12-01
Measuring progress in the field of Artificial General Intelligence (AGI) can be difficult without commonly accepted methods of evaluation. An AGI benchmark would allow evaluation and comparison of the many computational intelligence algorithms that have been developed. In this paper I propose that a benchmark for natural world interaction would possess seven key characteristics: fitness, breadth, specificity, low cost, simplicity, range, and task focus. I also outline two benchmark examples that meet most of these criteria. In the first, the direction task, a human coach directs a machine to perform a novel task in an unfamiliar environment. The direction task is extremely broad, but may be idealistic. In the second, the AGI battery, AGI candidates are evaluated based on their performance on a collection of more specific tasks. The AGI battery is designed to be appropriate to the capabilities of currently existing systems. Both the direction task and the AGI battery would require further definition before implementing. The paper concludes with a description of a task that might be included in the AGI battery: the search and retrieve task.
Rethinking the reference collection: exploring benchmarks and e-book availability.
Husted, Jeffrey T; Czechowski, Leslie J
2012-01-01
Librarians in the Health Sciences Library System at the University of Pittsburgh explored the possibility of developing an electronic reference collection that would replace the print reference collection, thus providing access to these valuable materials to a widely dispersed user population. The librarians evaluated the print reference collection and standard collection development lists as potential benchmarks for the electronic collection, and they determined which books were available in electronic format. They decided that the low availability of electronic versions of titles in each benchmark group rendered the creation of an electronic reference collection using either benchmark impractical.
CFD validation experiments for hypersonic flows
NASA Technical Reports Server (NTRS)
Marvin, Joseph G.
1992-01-01
A roadmap for CFD code validation is introduced. The elements of the roadmap are consistent with air-breathing vehicle design requirements and related to the important flow path components: forebody, inlet, combustor, and nozzle. Building block and benchmark validation experiments are identified along with their test conditions and measurements. Based on an evaluation criteria, recommendations for an initial CFD validation data base are given and gaps identified where future experiments could provide new validation data.
Land, Sander; Gurev, Viatcheslav; Arens, Sander; Augustin, Christoph M; Baron, Lukas; Blake, Robert; Bradley, Chris; Castro, Sebastian; Crozier, Andrew; Favino, Marco; Fastl, Thomas E; Fritz, Thomas; Gao, Hao; Gizzi, Alessio; Griffith, Boyce E; Hurtado, Daniel E; Krause, Rolf; Luo, Xiaoyu; Nash, Martyn P; Pezzuto, Simone; Plank, Gernot; Rossi, Simone; Ruprecht, Daniel; Seemann, Gunnar; Smith, Nicolas P; Sundnes, Joakim; Rice, J Jeremy; Trayanova, Natalia; Wang, Dafang; Jenny Wang, Zhinuo; Niederer, Steven A
2015-12-08
Models of cardiac mechanics are increasingly used to investigate cardiac physiology. These models are characterized by a high level of complexity, including the particular anisotropic material properties of biological tissue and the actively contracting material. A large number of independent simulation codes have been developed, but a consistent way of verifying the accuracy and replicability of simulations is lacking. To aid in the verification of current and future cardiac mechanics solvers, this study provides three benchmark problems for cardiac mechanics. These benchmark problems test the ability to accurately simulate pressure-type forces that depend on the deformed objects geometry, anisotropic and spatially varying material properties similar to those seen in the left ventricle and active contractile forces. The benchmark was solved by 11 different groups to generate consensus solutions, with typical differences in higher-resolution solutions at approximately 0.5%, and consistent results between linear, quadratic and cubic finite elements as well as different approaches to simulating incompressible materials. Online tools and solutions are made available to allow these tests to be effectively used in verification of future cardiac mechanics software.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We provide a paper-and-pencil specification of a benchmark suite for computational grids. It is based on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks (NPB) and is called the NAS Grid Benchmarks (NGB). NGB problems are presented as data flow graphs encapsulating an instance of a slightly modified NPB task in each graph node, which communicates with other nodes by sending/receiving initialization data. Like NPB, NGB specifies several different classes (problem sizes). In this report we describe classes S, W, and A, and provide verification values for each. The implementor has the freedom to choose any language, grid environment, security model, fault tolerance/error correction mechanism, etc., as long as the resulting implementation passes the verification test and reports the turnaround time of the benchmark.
Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R
2014-01-01
Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
2014-01-01
Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. PMID:24708189
Pasquini, Marcelo C; Logan, Brent; Jones, Richard J; Alousi, Amin M; Appelbaum, Frederick R; Bolaños-Meade, Javier; Flowers, Mary E D; Giralt, Sergio; Horowitz, Mary M; Jacobsohn, David; Koreth, John; Levine, John E; Luznik, Leo; Maziarz, Richard; Mendizabal, Adam; Pavletic, Steven; Perales, Miguel-Angel; Porter, David; Reshef, Ran; Weisdorf, Daniel; Antin, Joseph H
2018-06-01
Graft-versus-host disease (GVHD) is a common complication after hematopoietic cell transplantation (HCT) and associated with significant morbidity and mortality. Preventing GVHD without chronic therapy or increasing relapse is a desired goal. Here we report a benchmark analysis to evaluate the performance of 6 GVHD prevention strategies tested at single institutions compared with a large multicenter outcomes database as a control. Each intervention was compared with the control for the incidence of acute and chronic GVHD and overall survival and against novel composite endpoints: acute and chronic GVHD, relapse-free survival (GRFS), and chronic GVHD, relapse-free survival (CRFS). Modeling GRFS and CRFS using the benchmark analysis further informed the design of 2 clinical trials testing GVHD prophylaxis interventions. This study demonstrates the potential benefit of using an outcomes database to select promising interventions for multicenter clinical trials and proposes novel composite endpoints for use in GVHD prevention trials. Copyright © 2018 The American Society for Blood and Marrow Transplantation. Published by Elsevier Inc. All rights reserved.
Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
Butkiewicz, Mariusz; Lowe, Edward W.; Mueller, Ralf; Mendenhall, Jeffrey L.; Teixeira, Pedro L.; Weaver, C. David; Meiler, Jens
2013-01-01
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed. PMID:23299552
The Earthquake Source Inversion Validation (SIV) - Project: Summary, Status, Outlook
NASA Astrophysics Data System (ADS)
Mai, P. M.
2017-12-01
Finite-fault earthquake source inversions infer the (time-dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, this kinematic source inversion is ill-posed and returns non-unique solutions, as seen for instance in multiple source models for the same earthquake, obtained by different research teams, that often exhibit remarkable dissimilarities. To address the uncertainties in earthquake-source inversions and to understand strengths and weaknesses of various methods, the Source Inversion Validation (SIV) project developed a set of forward-modeling exercises and inversion benchmarks. Several research teams then use these validation exercises to test their codes and methods, but also to develop and benchmark new approaches. In this presentation I will summarize the SIV strategy, the existing benchmark exercises and corresponding results. Using various waveform-misfit criteria and newly developed statistical comparison tools to quantify source-model (dis)similarities, the SIV platforms is able to rank solutions and identify particularly promising source inversion approaches. Existing SIV exercises (with related data and descriptions) and all computational tools remain available via the open online collaboration platform; additional exercises and benchmark tests will be uploaded once they are fully developed. I encourage source modelers to use the SIV benchmarks for developing and testing new methods. The SIV efforts have already led to several promising new techniques for tackling the earthquake-source imaging problem. I expect that future SIV benchmarks will provide further innovations and insights into earthquake source kinematics that will ultimately help to better understand the dynamics of the rupture process.
MacDonald, Donald D.; Ingersoll, Christopher G.; Smorong, Dawn E.; Sinclair, Jesse A.; Lindskoog, Rebekka; Wang, Ning; Severn, Corrine; Gouguet, Ron; Meyer, John; Field, Jay
2011-01-01
Three sets of effects-based sediment-quality guidelines (SQGs) were evaluated to support the selection of sediment-quality benchmarks for assessing risks to benthic invertebrates in the Calcasieu Estuary, Louisiana. These SQGs included probable effect concentrations (PECs), effects range median values (ERMs), and logistic regression model (LRMs)-based T50 values. The results of this investigation indicate that all three sets of SQGs tend to underestimate sediment toxicity in the Calcasieu Estuary (i.e., relative to the national data sets), as evaluated using the results of 10-day toxicity tests with the amphipod, Hyalella azteca, or Ampelisca abdita, and 28-day whole-sediment toxicity tests with the H. azteca. These results emphasize the importance of deriving site-specific toxicity thresholds for assessing risks to benthic invertebrates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Grace L.; Department of Health Services Research, The University of Texas MD Anderson Cancer Center, Houston, Texas; Jiang, Jing
Purpose: High-quality treatment for intact cervical cancer requires external radiation therapy, brachytherapy, and chemotherapy, carefully sequenced and completed without delays. We sought to determine how frequently current treatment meets quality benchmarks and whether new technologies have influenced patterns of care. Methods and Materials: By searching diagnosis and procedure claims in MarketScan, an employment-based health care claims database, we identified 1508 patients with nonmetastatic, intact cervical cancer treated from 1999 to 2011, who were <65 years of age and received >10 fractions of radiation. Treatments received were identified using procedure codes and compared with 3 quality benchmarks: receipt of brachytherapy, receipt ofmore » chemotherapy, and radiation treatment duration not exceeding 63 days. The Cochran-Armitage test was used to evaluate temporal trends. Results: Seventy-eight percent of patients (n=1182) received brachytherapy, with brachytherapy receipt stable over time (Cochran-Armitage P{sub trend}=.15). Among patients who received brachytherapy, 66% had high–dose rate and 34% had low–dose rate treatment, although use of high–dose rate brachytherapy steadily increased to 75% by 2011 (P{sub trend}<.001). Eighteen percent of patients (n=278) received intensity modulated radiation therapy (IMRT), and IMRT receipt increased to 37% by 2011 (P{sub trend}<.001). Only 2.5% of patients (n=38) received IMRT in the setting of brachytherapy omission. Overall, 79% of patients (n=1185) received chemotherapy, and chemotherapy receipt increased to 84% by 2011 (P{sub trend}<.001). Median radiation treatment duration was 56 days (interquartile range, 47-65 days); however, duration exceeded 63 days in 36% of patients (n=543). Although 98% of patients received at least 1 benchmark treatment, only 44% received treatment that met all 3 benchmarks. With more stringent indicators (brachytherapy, ≥4 chemotherapy cycles, and duration not exceeding 56 days), only 25% of patients received treatment that met all benchmarks. Conclusion: In this cohort, most cervical cancer patients received treatment that did not comply with all 3 benchmarks for quality treatment. In contrast to increasing receipt of newer radiation technologies, there was little improvement in receipt of essential treatment benchmarks.« less
Benchmarking of HEU Mental Annuli Critical Assemblies with Internally Reflected Graphite Cylinder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiaobo, Liu; Bess, John D.; Marshall, Margaret A.
Three experimental configurations of critical assemblies, performed in 1963 at the Oak Ridge Critical Experiment Facility, which are assembled using three different diameter HEU annuli (15-9 inches, 15-7 inches and 13-7 inches) metal annuli with internally reflected graphite cylinder are evaluated and benchmarked. The experimental uncertainties which are 0.00055, 0.00055 and 0.00055 respectively, and biases to the detailed benchmark models which are -0.00179, -0.00189 and -0.00114 respectively, were determined, and the experimental benchmark keff results were obtained for both detailed and simplified model. The calculation results for both detailed and simplified models using MCNP6-1.0 and ENDF VII.1 agree well tomore » the benchmark experimental results with a difference of less than 0.2%. These are acceptable benchmark experiments for inclusion in the ICSBEP Handbook.« less
A new numerical benchmark of a freshwater lens
NASA Astrophysics Data System (ADS)
Stoeckl, L.; Walther, M.; Graf, T.
2016-04-01
A numerical benchmark for 2-D variable-density flow and solute transport in a freshwater lens is presented. The benchmark is based on results of laboratory experiments conducted by Stoeckl and Houben (2012) using a sand tank on the meter scale. This benchmark describes the formation and degradation of a freshwater lens over time as it can be found under real-world islands. An error analysis gave the appropriate spatial and temporal discretization of 1 mm and 8.64 s, respectively. The calibrated parameter set was obtained using the parameter estimation tool PEST. Comparing density-coupled and density-uncoupled results showed that the freshwater-saltwater interface position is strongly dependent on density differences. A benchmark that adequately represents saltwater intrusion and that includes realistic features of coastal aquifers or freshwater lenses was lacking. This new benchmark was thus developed and is demonstrated to be suitable to test variable-density groundwater models applied to saltwater intrusion investigations.
How to benchmark methods for structure-based virtual screening of large compound libraries.
Christofferson, Andrew J; Huang, Niu
2012-01-01
Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.
ERIC Educational Resources Information Center
Fenton, Ray
This study examined the relative efficacy of the Anchorage (Alaska) Pre-Algebra Test and the State of Alaska Benchmark in 2 Math examination as tools used in the process of recommending grade 6 students for grade 7 Pre-Algebra placement. The consequential validity of the tests is explored in the context of class placements and grades earned. The…
NASA Astrophysics Data System (ADS)
Kostrzewa, Daniel; Josiński, Henryk
2016-06-01
The expanded Invasive Weed Optimization algorithm (exIWO) is an optimization metaheuristic modelled on the original IWO version inspired by dynamic growth of weeds colony. The authors of the present paper have modified the exIWO algorithm introducing a set of both deterministic and non-deterministic strategies of individuals' selection. The goal of the project was to evaluate the modified exIWO by testing its usefulness for multidimensional numerical functions optimization. The optimized functions: Griewank, Rastrigin, and Rosenbrock are frequently used as benchmarks because of their characteristics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suter, G.W. II; Mabrey, J.B.
1994-07-01
This report presents potential screening benchmarks for protection of aquatic life from contaminants in water. Because there is no guidance for screening benchmarks, a set of alternative benchmarks is presented herein. The alternative benchmarks are based on different conceptual approaches to estimating concentrations causing significant effects. For the upper screening benchmark, there are the acute National Ambient Water Quality Criteria (NAWQC) and the Secondary Acute Values (SAV). The SAV concentrations are values estimated with 80% confidence not to exceed the unknown acute NAWQC for those chemicals with no NAWQC. The alternative chronic benchmarks are the chronic NAWQC, the Secondary Chronicmore » Value (SCV), the lowest chronic values for fish and daphnids from chronic toxicity tests, the estimated EC20 for a sensitive species, and the concentration estimated to cause a 20% reduction in the recruit abundance of largemouth bass. It is recommended that ambient chemical concentrations be compared to all of these benchmarks. If NAWQC are exceeded, the chemicals must be contaminants of concern because the NAWQC are applicable or relevant and appropriate requirements (ARARs). If NAWQC are not exceeded, but other benchmarks are, contaminants should be selected on the basis of the number of benchmarks exceeded and the conservatism of the particular benchmark values, as discussed in the text. To the extent that toxicity data are available, this report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility.« less
Performance of a Lexical and POS Tagger for Sanskrit
NASA Astrophysics Data System (ADS)
Hellwig, Oliver
Due to the phonetic, morphological, and lexical complexity of Sanskrit, the automatic analysis of this language is a real challenge in the area of natural language processing. The paper describes a series of tests that were performed to assess the accuracy of the tagging program SanskritTagger. To our knowlegde, it offers the first reliable benchmark data for evaluating the quality of taggers for Sanskrit using an unrestricted dictionary and texts from different domains. Based on a detailed analysis of the test results, the paper points out possible directions for future improvements of statistical tagging procedures for Sanskrit.
A benchmark study of the sea-level equation in GIA modelling
NASA Astrophysics Data System (ADS)
Martinec, Zdenek; Klemann, Volker; van der Wal, Wouter; Riva, Riccardo; Spada, Giorgio; Simon, Karen; Blank, Bas; Sun, Yu; Melini, Daniele; James, Tom; Bradley, Sarah
2017-04-01
The sea-level load in glacial isostatic adjustment (GIA) is described by the so called sea-level equation (SLE), which represents the mass redistribution between ice sheets and oceans on a deforming earth. Various levels of complexity of SLE have been proposed in the past, ranging from a simple mean global sea level (the so-called eustatic sea level) to the load with a deforming ocean bottom, migrating coastlines and a changing shape of the geoid. Several approaches to solve the SLE have been derived, from purely analytical formulations to fully numerical methods. Despite various teams independently investigating GIA, there has been no systematic intercomparison amongst the solvers through which the methods may be validated. The goal of this paper is to present a series of benchmark experiments designed for testing and comparing numerical implementations of the SLE. Our approach starts with simple load cases even though the benchmark will not result in GIA predictions for a realistic loading scenario. In the longer term we aim for a benchmark with a realistic loading scenario, and also for benchmark solutions with rotational feedback. The current benchmark uses an earth model for which Love numbers have been computed and benchmarked in Spada et al (2011). In spite of the significant differences in the numerical methods employed, the test computations performed so far show a satisfactory agreement between the results provided by the participants. The differences found can often be attributed to the different approximations inherent to the various algorithms. Literature G. Spada, V. R. Barletta, V. Klemann, R. E. M. Riva, Z. Martinec, P. Gasperini, B. Lund, D. Wolf, L. L. A. Vermeersen, and M. A. King, 2011. A benchmark study for glacial isostatic adjustment codes. Geophys. J. Int. 185: 106-132 doi:10.1111/j.1365-
Benchmarking: A strategic overview of a key management tool
Chris Leclair
1999-01-01
Benchmarking is a continuous, systematic process for evaluating the products, services, and work processes of organizations in an effort to identifY best practices for possible adoption in support of the objectives of enhanced activity service delivery and organizational effectiveness.
Sequoia Messaging Rate Benchmark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedley, Andrew
2008-01-22
The purpose of this benchmark is to measure the maximal message rate of a single compute node. The first num_cores ranks are expected to reside on the 'core' compute node for which message rate is being tested. After that, the next num_nbors ranks are neighbors for the first core rank, the next set of num_nbors ranks are neighbors for the second core rank, and so on. For example, testing an 8-core node (num_cores = 8) with 4 neighbors (num_nbors = 4) requires 8 + 8 * 4 - 40 ranks. The first 8 of those 40 ranks are expected tomore » be on the 'core' node being benchmarked, while the rest of the ranks are on separate nodes.« less
Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka
2016-01-01
Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296
Data Race Benchmark Collection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua
2017-03-21
This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
An open-source framework for stress-testing non-invasive foetal ECG extraction algorithms.
Andreotti, Fernando; Behar, Joachim; Zaunseder, Sebastian; Oster, Julien; Clifford, Gari D
2016-05-01
Over the past decades, many studies have been published on the extraction of non-invasive foetal electrocardiogram (NI-FECG) from abdominal recordings. Most of these contributions claim to obtain excellent results in detecting foetal QRS (FQRS) complexes in terms of location. A small subset of authors have investigated the extraction of morphological features from the NI-FECG. However, due to the shortage of available public databases, the large variety of performance measures employed and the lack of open-source reference algorithms, most contributions cannot be meaningfully assessed. This article attempts to address these issues by presenting a standardised methodology for stress testing NI-FECG algorithms, including absolute data, as well as extraction and evaluation routines. To that end, a large database of realistic artificial signals was created, totaling 145.8 h of multichannel data and over one million FQRS complexes. An important characteristic of this dataset is the inclusion of several non-stationary events (e.g. foetal movements, uterine contractions and heart rate fluctuations) that are critical for evaluating extraction routines. To demonstrate our testing methodology, three classes of NI-FECG extraction algorithms were evaluated: blind source separation (BSS), template subtraction (TS) and adaptive methods (AM). Experiments were conducted to benchmark the performance of eight NI-FECG extraction algorithms on the artificial database focusing on: FQRS detection and morphological analysis (foetal QT and T/QRS ratio). The overall median FQRS detection accuracies (i.e. considering all non-stationary events) for the best performing methods in each group were 99.9% for BSS, 97.9% for AM and 96.0% for TS. Both FQRS detections and morphological parameters were shown to heavily depend on the extraction techniques and signal-to-noise ratio. Particularly, it is shown that their evaluation in the source domain, obtained after using a BSS technique, should be avoided. Data, extraction algorithms and evaluation routines were released as part of the fecgsyn toolbox on Physionet under an GNU GPL open-source license. This contribution provides a standard framework for benchmarking and regulatory testing of NI-FECG extraction algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele
2014-11-11
has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
An evaluation of the accuracy and speed of metagenome analysis tools
Lindgreen, Stinus; Adair, Karen L.; Gardner, Paul P.
2016-01-01
Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html PMID:26778510
High Temperature Dynamic Pressure Measurements Using Silicon Carbide Pressure Sensors
NASA Technical Reports Server (NTRS)
Okojie, Robert S.; Meredith, Roger D.; Chang, Clarence T.; Savrun, Ender
2014-01-01
Un-cooled, MEMS-based silicon carbide (SiC) static pressure sensors were used for the first time to measure pressure perturbations at temperatures as high as 600 C during laboratory characterization, and subsequently evaluated in a combustor rig operated under various engine conditions to extract the frequencies that are associated with thermoacoustic instabilities. One SiC sensor was placed directly in the flow stream of the combustor rig while a benchmark commercial water-cooled piezoceramic dynamic pressure transducer was co-located axially but kept some distance away from the hot flow stream. In the combustor rig test, the SiC sensor detected thermoacoustic instabilities across a range of engine operating conditions, amplitude magnitude as low as 0.5 psi at 585 C, in good agreement with the benchmark piezoceramic sensor. The SiC sensor experienced low signal to noise ratio at higher temperature, primarily due to the fact that it was a static sensor with low sensitivity.
Tuppurainen, Kari; Viisas, Marja; Laatikainen, Reino; Peräkylä, Mikael
2002-01-01
A novel electronic eigenvalue (EEVA) descriptor of molecular structure for use in the derivation of predictive QSAR/QSPR models is described. Like other spectroscopic QSAR/QSPR descriptors, EEVA is also invariant as to the alignment of the structures concerned. Its performance was tested with respect to the CBG (corticosteroid binding globulin) affinity of 31 benchmark steroids. It appeared that the electronic structure of the steroids, i.e., the "spectra" derived from molecular orbital energies, is directly related to the CBG binding affinities. The predictive ability of EEVA is compared to other QSAR approaches, and its performance is discussed in the context of the Hammett equation. The good performance of EEVA is an indication of the essential quantum mechanical nature of QSAR. The EEVA method is a supplement to conventional 3D QSAR methods, which employ fields or surface properties derived from Coulombic and van der Waals interactions.
Using a fuzzy comprehensive evaluation method to determine product usability: A test case
Zhou, Ronggang; Chan, Alan H. S.
2016-01-01
BACKGROUND: In order to take into account the inherent uncertainties during product usability evaluation, Zhou and Chan [1] proposed a comprehensive method of usability evaluation for products by combining the analytic hierarchy process (AHP) and fuzzy evaluation methods for synthesizing performance data and subjective response data. This method was designed to provide an integrated framework combining the inevitable vague judgments from the multiple stages of the product evaluation process. OBJECTIVE AND METHODS: In order to illustrate the effectiveness of the model, this study used a summative usability test case to assess the application and strength of the general fuzzy usability framework. To test the proposed fuzzy usability evaluation framework [1], a standard summative usability test was conducted to benchmark the overall usability of a specific network management software. Based on the test data, the fuzzy method was applied to incorporate both the usability scores and uncertainties involved in the multiple components of the evaluation. Then, with Monte Carlo simulation procedures, confidence intervals were used to compare the reliabilities among the fuzzy approach and two typical conventional methods combining metrics based on percentages. RESULTS AND CONCLUSIONS: This case study showed that the fuzzy evaluation technique can be applied successfully for combining summative usability testing data to achieve an overall usability quality for the network software evaluated. Greater differences of confidence interval widths between the method of averaging equally percentage and weighted evaluation method, including the method of weighted percentage averages, verified the strength of the fuzzy method. PMID:28035942
Using a fuzzy comprehensive evaluation method to determine product usability: A test case.
Zhou, Ronggang; Chan, Alan H S
2017-01-01
In order to take into account the inherent uncertainties during product usability evaluation, Zhou and Chan [1] proposed a comprehensive method of usability evaluation for products by combining the analytic hierarchy process (AHP) and fuzzy evaluation methods for synthesizing performance data and subjective response data. This method was designed to provide an integrated framework combining the inevitable vague judgments from the multiple stages of the product evaluation process. In order to illustrate the effectiveness of the model, this study used a summative usability test case to assess the application and strength of the general fuzzy usability framework. To test the proposed fuzzy usability evaluation framework [1], a standard summative usability test was conducted to benchmark the overall usability of a specific network management software. Based on the test data, the fuzzy method was applied to incorporate both the usability scores and uncertainties involved in the multiple components of the evaluation. Then, with Monte Carlo simulation procedures, confidence intervals were used to compare the reliabilities among the fuzzy approach and two typical conventional methods combining metrics based on percentages. This case study showed that the fuzzy evaluation technique can be applied successfully for combining summative usability testing data to achieve an overall usability quality for the network software evaluated. Greater differences of confidence interval widths between the method of averaging equally percentage and weighted evaluation method, including the method of weighted percentage averages, verified the strength of the fuzzy method.
Educating Next Generation Nuclear Criticality Safety Engineers at the Idaho National Laboratory
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. D. Bess; J. B. Briggs; A. S. Garcia
2011-09-01
One of the challenges in educating our next generation of nuclear safety engineers is the limitation of opportunities to receive significant experience or hands-on training prior to graduation. Such training is generally restricted to on-the-job-training before this new engineering workforce can adequately provide assessment of nuclear systems and establish safety guidelines. Participation in the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) can provide students and young professionals the opportunity to gain experience and enhance critical engineering skills. The ICSBEP and IRPhEP publish annual handbooks that contain evaluations of experiments along withmore » summarized experimental data and peer-reviewed benchmark specifications to support the validation of neutronics codes, nuclear cross-section data, and the validation of reactor designs. Participation in the benchmark process not only benefits those who use these Handbooks within the international community, but provides the individual with opportunities for professional development, networking with an international community of experts, and valuable experience to be used in future employment. Traditionally students have participated in benchmarking activities via internships at national laboratories, universities, or companies involved with the ICSBEP and IRPhEP programs. Additional programs have been developed to facilitate the nuclear education of students while participating in the benchmark projects. These programs include coordination with the Center for Space Nuclear Research (CSNR) Next Degree Program, the Collaboration with the Department of Energy Idaho Operations Office to train nuclear and criticality safety engineers, and student evaluations as the basis for their Master's thesis in nuclear engineering.« less
TRUST. I. A 3D externally illuminated slab benchmark for dust radiative transfer
NASA Astrophysics Data System (ADS)
Gordon, K. D.; Baes, M.; Bianchi, S.; Camps, P.; Juvela, M.; Kuiper, R.; Lunttila, T.; Misselt, K. A.; Natale, G.; Robitaille, T.; Steinacker, J.
2017-07-01
Context. The radiative transport of photons through arbitrary three-dimensional (3D) structures of dust is a challenging problem due to the anisotropic scattering of dust grains and strong coupling between different spatial regions. The radiative transfer problem in 3D is solved using Monte Carlo or Ray Tracing techniques as no full analytic solution exists for the true 3D structures. Aims: We provide the first 3D dust radiative transfer benchmark composed of a slab of dust with uniform density externally illuminated by a star. This simple 3D benchmark is explicitly formulated to provide tests of the different components of the radiative transfer problem including dust absorption, scattering, and emission. Methods: The details of the external star, the slab itself, and the dust properties are provided. This benchmark includes models with a range of dust optical depths fully probing cases that are optically thin at all wavelengths to optically thick at most wavelengths. The dust properties adopted are characteristic of the diffuse Milky Way interstellar medium. This benchmark includes solutions for the full dust emission including single photon (stochastic) heating as well as two simplifying approximations: One where all grains are considered in equilibrium with the radiation field and one where the emission is from a single effective grain with size-distribution-averaged properties. A total of six Monte Carlo codes and one Ray Tracing code provide solutions to this benchmark. Results: The solution to this benchmark is given as global spectral energy distributions (SEDs) and images at select diagnostic wavelengths from the ultraviolet through the infrared. Comparison of the results revealed that the global SEDs are consistent on average to a few percent for all but the scattered stellar flux at very high optical depths. The image results are consistent within 10%, again except for the stellar scattered flux at very high optical depths. The lack of agreement between different codes of the scattered flux at high optical depths is quantified for the first time. Convergence tests using one of the Monte Carlo codes illustrate the sensitivity of the solutions to various model parameters. Conclusions: We provide the first 3D dust radiative transfer benchmark and validate the accuracy of this benchmark through comparisons between multiple independent codes and detailed convergence tests.
A CFD validation roadmap for hypersonic flows
NASA Technical Reports Server (NTRS)
Marvin, Joseph G.
1992-01-01
A roadmap for computational fluid dynamics (CFD) code validation is developed. The elements of the roadmap are consistent with air-breathing vehicle design requirements and related to the important flow path components: forebody, inlet, combustor, and nozzle. Building block and benchmark validation experiments are identified along with their test conditions and measurements. Based on an evaluation criteria, recommendations for an initial CFD validation data base are given and gaps identified where future experiments would provide the needed validation data.
A CFD validation roadmap for hypersonic flows
NASA Technical Reports Server (NTRS)
Marvin, Joseph G.
1993-01-01
A roadmap for computational fluid dynamics (CFD) code validation is developed. The elements of the roadmap are consistent with air-breathing vehicle design requirements and related to the important flow path components: forebody, inlet, combustor, and nozzle. Building block and benchmark validation experiments are identified along with their test conditions and measurements. Based on an evaluation criteria, recommendations for an initial CFD validation data base are given and gaps identified where future experiments would provide the needed validation data.
NASA Technical Reports Server (NTRS)
Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)
1997-01-01
A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.
Error Rates in Users of Automatic Face Recognition Software
White, David; Dunn, James D.; Schmid, Alexandra C.; Kemp, Richard I.
2015-01-01
In recent years, wide deployment of automatic face recognition systems has been accompanied by substantial gains in algorithm performance. However, benchmarking tests designed to evaluate these systems do not account for the errors of human operators, who are often an integral part of face recognition solutions in forensic and security settings. This causes a mismatch between evaluation tests and operational accuracy. We address this by measuring user performance in a face recognition system used to screen passport applications for identity fraud. Experiment 1 measured target detection accuracy in algorithm-generated ‘candidate lists’ selected from a large database of passport images. Accuracy was notably poorer than in previous studies of unfamiliar face matching: participants made over 50% errors for adult target faces, and over 60% when matching images of children. Experiment 2 then compared performance of student participants to trained passport officers–who use the system in their daily work–and found equivalent performance in these groups. Encouragingly, a group of highly trained and experienced “facial examiners” outperformed these groups by 20 percentage points. We conclude that human performance curtails accuracy of face recognition systems–potentially reducing benchmark estimates by 50% in operational settings. Mere practise does not attenuate these limits, but superior performance of trained examiners suggests that recruitment and selection of human operators, in combination with effective training and mentorship, can improve the operational accuracy of face recognition systems. PMID:26465631
Benchmarking Strategies for Measuring the Quality of Healthcare: Problems and Prospects
Lovaglio, Pietro Giorgio
2012-01-01
Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed. PMID:22666140
Benchmarking strategies for measuring the quality of healthcare: problems and prospects.
Lovaglio, Pietro Giorgio
2012-01-01
Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment that modifies and improves the patient's state of health. This paper, dealing with hospital effectiveness, focuses on research methods for effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed.
Automated benchmarking of peptide-MHC class I binding predictions.
Trolle, Thomas; Metushi, Imir G; Greenbaum, Jason A; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten
2015-07-01
Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. mniel@cbs.dtu.dk or bpeters@liai.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Pasquali, Sara K; Wallace, Amelia S; Gaynor, J William; Jacobs, Marshall L; O'Brien, Sean M; Hill, Kevin D; Gaies, Michael G; Romano, Jennifer C; Shahian, David M; Mayer, John E; Jacobs, Jeffrey P
2016-11-01
Performance assessment in congenital heart surgery is challenging due to the wide heterogeneity of disease. We describe current case mix across centers, evaluate methodology inclusive of all cardiac operations versus the more homogeneous subset of Society of Thoracic Surgeons benchmark operations, and describe implications regarding performance assessment. Centers (n = 119) participating in the Society of Thoracic Surgeons Congenital Heart Surgery Database (2010 through 2014) were included. Index operation type and frequency across centers were described. Center performance (risk-adjusted operative mortality) was evaluated and classified when including the benchmark versus all eligible operations. Overall, 207 types of operations were performed during the study period (112,140 total cases). Few operations were performed across all centers; only 25% were performed at least once by 75% or more of centers. There was 7.9-fold variation across centers in the proportion of total cases comprising high-complexity cases (STAT 5). In contrast, the benchmark operations made up 36% of cases, and all but 2 were performed by at least 90% of centers. When evaluating performance based on benchmark versus all operations, 15% of centers changed performance classification; 85% remained unchanged. Benchmark versus all operation methodology was associated with lower power, with 35% versus 78% of centers meeting sample size thresholds. There is wide variation in congenital heart surgery case mix across centers. Metrics based on benchmark versus all operations are associated with strengths (less heterogeneity) and weaknesses (lower power), and lead to differing performance classification for some centers. These findings have implications for ongoing efforts to optimize performance assessment, including choice of target population and appropriate interpretation of reported metrics. Copyright © 2016 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Toward multimodal signal detection of adverse drug reactions.
Harpaz, Rave; DuMouchel, William; Schuemie, Martijn; Bodenreider, Olivier; Friedman, Carol; Horvitz, Eric; Ripple, Anna; Sorbello, Alfred; White, Ryen W; Winnenburg, Rainer; Shah, Nigam H
2017-12-01
Improving mechanisms to detect adverse drug reactions (ADRs) is key to strengthening post-marketing drug safety surveillance. Signal detection is presently unimodal, relying on a single information source. Multimodal signal detection is based on jointly analyzing multiple information sources. Building on, and expanding the work done in prior studies, the aim of the article is to further research on multimodal signal detection, explore its potential benefits, and propose methods for its construction and evaluation. Four data sources are investigated; FDA's adverse event reporting system, insurance claims, the MEDLINE citation database, and the logs of major Web search engines. Published methods are used to generate and combine signals from each data source. Two distinct reference benchmarks corresponding to well-established and recently labeled ADRs respectively are used to evaluate the performance of multimodal signal detection in terms of area under the ROC curve (AUC) and lead-time-to-detection, with the latter relative to labeling revision dates. Limited to our reference benchmarks, multimodal signal detection provides AUC improvements ranging from 0.04 to 0.09 based on a widely used evaluation benchmark, and a comparative added lead-time of 7-22 months relative to labeling revision dates from a time-indexed benchmark. The results support the notion that utilizing and jointly analyzing multiple data sources may lead to improved signal detection. Given certain data and benchmark limitations, the early stage of development, and the complexity of ADRs, it is currently not possible to make definitive statements about the ultimate utility of the concept. Continued development of multimodal signal detection requires a deeper understanding the data sources used, additional benchmarks, and further research on methods to generate and synthesize signals. Copyright © 2017 Elsevier Inc. All rights reserved.
Automated benchmarking of peptide-MHC class I binding predictions
Trolle, Thomas; Metushi, Imir G.; Greenbaum, Jason A.; Kim, Yohan; Sidney, John; Lund, Ole; Sette, Alessandro; Peters, Bjoern; Nielsen, Morten
2015-01-01
Motivation: Numerous in silico methods predicting peptide binding to major histocompatibility complex (MHC) class I molecules have been developed over the last decades. However, the multitude of available prediction tools makes it non-trivial for the end-user to select which tool to use for a given task. To provide a solid basis on which to compare different prediction tools, we here describe a framework for the automated benchmarking of peptide-MHC class I binding prediction tools. The framework runs weekly benchmarks on data that are newly entered into the Immune Epitope Database (IEDB), giving the public access to frequent, up-to-date performance evaluations of all participating tools. To overcome potential selection bias in the data included in the IEDB, a strategy was implemented that suggests a set of peptides for which different prediction methods give divergent predictions as to their binding capability. Upon experimental binding validation, these peptides entered the benchmark study. Results: The benchmark has run for 15 weeks and includes evaluation of 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC binding measurements. Inspection of the results allows the end-user to make educated selections between participating tools. Of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB. Availability and implementation: Up-to-date performance evaluations of each server can be found online at http://tools.iedb.org/auto_bench/mhci/weekly. All prediction tool developers are invited to participate in the benchmark. Sign-up instructions are available at http://tools.iedb.org/auto_bench/mhci/join. Contact: mniel@cbs.dtu.dk or bpeters@liai.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25717196
Tiao, J; Moore, L; Porgo, T V; Belcaid, A
2016-06-01
To assess whether the definition of an IHF used as an exclusion criterion influences the results of trauma center benchmarking. We conducted a multicenter retrospective cohort study with data from an integrated Canadian trauma system. The study population included all patients admitted between 1999 and 2010 to any of the 57 adult trauma centers. Seven definitions of IHF based on diagnostic codes, age, mechanism of injury, and secondary injuries, identified in a systematic review, were used. Trauma centers were benchmarked using risk-adjusted mortality estimates generated using the Trauma Risk Adjustment Model. The agreement between benchmarking results generated under different IHF definitions was evaluated with correlation coefficients on adjusted mortality estimates. Correlation coefficients >0.95 were considered to convey acceptable agreement. The study population consisted of 172,872 patients before exclusion of IHF and between 128,094 and 139,588 patients after exclusion. Correlation coefficients between risk-adjusted mortality estimates generated in populations including and excluding IHF varied between 0.86 and 0.90. Correlation coefficients of estimates generated under different definitions of IHF varied between 0.97 and 0.99, even when analyses were restricted to patients aged ≥65 years. Although the exclusion of patients with IHF has an influence on the results of trauma center benchmarking based on mortality, the definition of IHF in terms of diagnostic codes, age, mechanism of injury and secondary injury has no significant impact on benchmarking results. Results suggest that there is no need to obtain formal consensus on the definition of IHF for benchmarking activities.
FFTF Passive Safety Test Data for Benchmarks for New LMR Designs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wootan, David W.; Casella, Andrew M.
Liquid Metal Reactors (LMRs) continue to be considered as an attractive concept for advanced reactor design. Software packages such as SASSYS are being used to im-prove new LMR designs and operating characteristics. Significant cost and safety im-provements can be realized in advanced liquid metal reactor designs by emphasizing inherent or passive safety through crediting the beneficial reactivity feedbacks associ-ated with core and structural movement. This passive safety approach was adopted for the Fast Flux Test Facility (FFTF), and an experimental program was conducted to characterize the structural reactivity feedback. The FFTF passive safety testing pro-gram was developed to examine howmore » specific design elements influenced dynamic re-activity feedback in response to a reactivity input and to demonstrate the scalability of reactivity feedback results to reactors of current interest. The U.S. Department of En-ergy, Office of Nuclear Energy Advanced Reactor Technology program is in the pro-cess of preserving, protecting, securing, and placing in electronic format information and data from the FFTF, including the core configurations and data collected during the passive safety tests. Benchmarks based on empirical data gathered during operation of the Fast Flux Test Facility (FFTF) as well as design documents and post-irradiation examination will aid in the validation of these software packages and the models and calculations they produce. Evaluation of these actual test data could provide insight to improve analytical methods which may be used to support future licensing applications for LMRs« less
NASA Technical Reports Server (NTRS)
McGalliard, James
2008-01-01
This viewgraph presentation details the science and systems environments that NASA High End computing program serves. Included is a discussion of the workload that is involved in the processing for the Global Climate Modeling. The Goddard Earth Observing System Model, Version 5 (GEOS-5) is a system of models integrated using the Earth System Modeling Framework (ESMF). The GEOS-5 system was used for the Benchmark tests, and the results of the tests are shown and discussed. Tests were also run for the Cubed Sphere system, results for these test are also shown.
Benchmarking the Collocation Stand-Alone Library and Toolkit (CSALT)
NASA Technical Reports Server (NTRS)
Hughes, Steven; Knittel, Jeremy; Shoan, Wendy; Kim, Youngkwang; Conway, Claire; Conway, Darrel J.
2017-01-01
This paper describes the processes and results of Verification and Validation (VV) efforts for the Collocation Stand Alone Library and Toolkit (CSALT). We describe the test program and environments, the tools used for independent test data, and comparison results. The VV effort employs classical problems with known analytic solutions, solutions from other available software tools, and comparisons to benchmarking data available in the public literature. Presenting all test results are beyond the scope of a single paper. Here we present high-level test results for a broad range of problems, and detailed comparisons for selected problems.
Benchmarking the Collocation Stand-Alone Library and Toolkit (CSALT)
NASA Technical Reports Server (NTRS)
Hughes, Steven; Knittel, Jeremy; Shoan, Wendy (Compiler); Kim, Youngkwang; Conway, Claire (Compiler); Conway, Darrel
2017-01-01
This paper describes the processes and results of Verification and Validation (V&V) efforts for the Collocation Stand Alone Library and Toolkit (CSALT). We describe the test program and environments, the tools used for independent test data, and comparison results. The V&V effort employs classical problems with known analytic solutions, solutions from other available software tools, and comparisons to benchmarking data available in the public literature. Presenting all test results are beyond the scope of a single paper. Here we present high-level test results for a broad range of problems, and detailed comparisons for selected problems.
Frias, Patricio A; Oster, Matthew; Daley, Patricia A; Boris, Jeffrey R
2016-03-01
We sought to benchmark the utilisation of echocardiography in the outpatient evaluation of heart murmurs by evaluating two large paediatric cardiology centres. Although criteria exist for appropriate use of echocardiography, there are no benchmarking data demonstrating its utilisation. We performed a retrospective cohort study of outpatients aged between 0 and 18 years at the Sibley Heart Center Cardiology and the Children's Hospital of Philadelphia Division of Cardiology, given a sole diagnosis of "innocent murmur" from 1 July, 2007 to 31 October, 2010. Using internal claims data, we compared the utilisation of echocardiography according to centre, patient age, and physician years of service. Of 23,114 eligible patients (Sibley Heart Center Cardiology: 12,815, Children's Hospital of Philadelphia Division of Cardiology: 10,299), 43.1% (Sibley Heart Center Cardiology: 45.2%, Children's Hospital of Philadelphia Division of Cardiology: 40.4%; p1-5 years had the lowest utilisation (32.7%). In two large paediatric cardiology practices, the overall utilisation of echocardiography by physicians with a sole diagnosis of innocent murmur was similar. There was significant and similar variability in utilisation by provider at both centres. Although these data serve as initial benchmarking, the variability in utilisation highlights the importance of appropriate use criteria.
Dervaux, Benoît; Baseilhac, Eric; Fagon, Jean-Yves; Biot, Claire; Blachier, Corinne; Braun, Eric; Debroucker, Frédérique; Detournay, Bruno; Ferretti, Carine; Granger, Muriel; Jouan-Flahault, Chrystel; Lussier, Marie-Dominique; Meyer, Arlette; Muller, Sophie; Pigeon, Martine; De Sahb, Rima; Sannié, Thomas; Sapède, Claudine; Vray, Muriel
2014-01-01
Decree No. 2012-1116 of 2 October 2012 on medico-economic assignments of the French National Authority for Health (Haute autorité de santé, HAS) significantly alters the conditions for accessing the health products market in France. This paper presents a theoretical framework for interpreting the results of the economic evaluation of health technologies and summarises the facts available in France for developing benchmarks that will be used to interpret incremental cost-effectiveness ratios. This literature review shows that it is difficult to determine a threshold value but it is also difficult to interpret then incremental cost effectiveness ratio (ICER) results without a threshold value. In this context, round table participants favour a pragmatic approach based on "benchmarks" as opposed to a threshold value, based on an interpretative and normative perspective, i.e. benchmarks that can change over time based on feedback. © 2014 Société Française de Pharmacologie et de Thérapeutique.
User-centered virtual environment assessment and design for cognitive rehabilitation applications
NASA Astrophysics Data System (ADS)
Fidopiastis, Cali Michael
Virtual environment (VE) design for cognitive rehabilitation necessitates a new methodology to ensure the validity of the resulting rehabilitation assessment. We propose that benchmarking the VE system technology utilizing a user-centered approach should precede the VE construction. Further, user performance baselines should be measured throughout testing as a control for adaptive effects that may confound the metrics chosen to evaluate the rehabilitation treatment. To support these claims we present data obtained from two modules of a user-centered head-mounted display (HMD) assessment battery, specifically resolution visual acuity and stereoacuity. Resolution visual acuity and stereoacuity assessments provide information about the image quality achieved by an HMD based upon its unique system parameters. When applying a user-centered approach, we were able to quantify limitations in the VE system components (e.g., low microdisplay resolution) and separately point to user characteristics (e.g., changes in dark focus) that may introduce error in the evaluation of VE based rehabilitation protocols. Based on these results, we provide guidelines for calibrating and benchmarking HMDs. In addition, we discuss potential extensions of the assessment to address higher level usability issues. We intend to test the proposed framework within the Human Experience Modeler (HEM), a testbed created at the University of Central Florida to evaluate technologies that may enhance cognitive rehabilitation effectiveness. Preliminary results of a feasibility pilot study conducted with a memory impaired participant showed that the HEM provides the control and repeatability needed to conduct such technology comparisons. Further, the HEM affords the opportunity to integrate new brain imaging technologies (i.e., functional Near Infrared Imaging) to evaluate brain plasticity associated with VE based cognitive rehabilitation.
NASA Astrophysics Data System (ADS)
Cowdery, E.; Dietze, M.
2016-12-01
As atmospheric levels of carbon dioxide levels continue to increase, it is critical that terrestrial ecosystem models can accurately predict ecological responses to the changing environment. Current predictions of net primary productivity (NPP) in response to elevated atmospheric CO2 concentration are highly variable and contain a considerable amount of uncertainty.The Predictive Ecosystem Analyzer (PEcAn) is an informatics toolbox that wraps around an ecosystem model and can be used to help identify which factors drive uncertainty. We tested a suite of models (LPJ-GUESS, MAESPA, GDAY, CLM5, DALEC, ED2), which represent a range from low to high structural complexity, across a range of Free-Air CO2 Enrichment (FACE) experiments: the Kennedy Space Center Open Top Chamber Experiment, the Rhinelander FACE experiment, the Duke Forest FACE experiment and the Oak Ridge Experiment on CO2 Enrichment. These tests were implemented in a novel benchmarking workflow that is automated, repeatable, and generalized to incorporate different sites and ecological models. Observational data from the FACE experiments represent a first test of this flexible, extensible approach aimed at providing repeatable tests of model process representation.To identify and evaluate the assumptions causing inter-model differences we used PEcAn to perform model sensitivity and uncertainty analysis, not only to assess the components of NPP, but also to examine system processes such nutrient uptake and and water use. Combining the observed patterns of uncertainty between multiple models with results of the recent FACE-model data synthesis project (FACE-MDS) can help identify which processes need further study and additional data constraints. These findings can be used to inform future experimental design and in turn can provide informative starting point for data assimilation.
Elsworth, Gerald R; Osborne, Richard H
2017-01-01
Objective: Participant self-report data play an essential role in the evaluation of health education activities, programmes and policies. When questionnaire items do not have a clear mapping to a performance-based continuum, percentile norms are useful for communicating individual test results to users. Similarly, when assessing programme impact, the comparison of effect sizes for group differences or baseline to follow-up change with effect sizes observed in relevant normative data provides more directly useful information compared with statistical tests of mean differences and the evaluation of effect sizes for substantive significance using universal rule-of-thumb such as those for Cohen’s ‘d’. This article aims to assist managers, programme staff and clinicians of healthcare organisations who use the Health Education Impact Questionnaire interpret their results using percentile norms for individual baseline and follow-up scores together with group effect sizes for change across the duration of typical chronic disease self-management and support programme. Methods: Percentile norms for individual Health Education Impact Questionnaire scale scores and effect sizes for group change were calculated using freely available software for each of the eight Health Education Impact Questionnaire scales. Data used were archived responses of 2157 participants of chronic disease self-management programmes conducted by a wide range of organisations in Australia between July 2007 and March 2013. Results: Tables of percentile norms and three possible effect size benchmarks for baseline to follow-up change are provided together with two worked examples to assist interpretation. Conclusion: While the norms and benchmarks presented will be particularly relevant for Australian organisations and others using the English-language version of the Health Education Impact Questionnaire, they will also be useful for translated versions as a guide to the sensitivity of the scales and the extent of the changes that might be anticipated from attendance at a typical chronic disease self-management or health education programme. PMID:28560039
NASA Astrophysics Data System (ADS)
Bohn, Meyer; Hopkins, David; Steele, Dean; Tuscherer, Sheldon
2017-04-01
The benchmark Barnes soil series is an extensive upland Hapludoll of the northern Great Plains that is both economically and ecologically vital to the region. Effects of tillage erosion coupled with wind and water erosion have degraded Barnes soil quality, but with unknown extent, distribution, or severity. Evidence of soil degradation documented for a half century warrants that the assumption of productivity be tested. Soil resilience is linked to several dynamic soil properties and National Cooperative Soil Survey initiatives are now focused on identifying those properties for benchmark soils. Quantification of soil degradation is dependent on a reliable method for broad-scale evaluation. The soil survey community is currently developing rapid and widespread soil property assessment technologies. Improvements in satellite based remote-sensing and image analysis software have stimulated the application of broad-scale resource assessment. Furthermore, these technologies have fostered refinement of land-based surface energy balance algorithms, i.e. Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC) algorithm for evapotranspiration (ET) mapping. The hypothesis of this study is that ET mapping technology can differentiate soil function on extensive landscapes and identify degraded areas. A recent soil change study in eastern North Dakota resampled legacy Barnes pedons sampled prior to 1960 and found significant decreases in organic carbon. An ancillary study showed that evapotranspiration (ET) estimates from METRIC decreased with Barnes erosion class severity. An ET raster map has been developed for three eastern North Dakota counties using METRIC and Landsat 5 imagery. ET pixel candidates on major Barnes soil map units were stratified into tertiles and classified as ranked ET subdivisions. A sampling population of randomly selected points stratified by ET class and county proportion was established. Morphologic and chemical data will be recorded at each sampling site to test whether soil properties correlate to ET, thus serving as a non-biased proxy for soil health.
Dejkovski, Nick
2016-10-01
This paper reports the audit findings of the waste management practices at 30 construction materials testing (CMT) laboratories (constituting 4.6% of total accredited CMT laboratories at the time of the audit) that operate in four Australian jurisdictions and assesses the organisation's Environmental Management System (EMS) for indicators of progress towards sustainable development (SD). In Australia, waste indicators are 'priority indicators' of environmental performance yet the quality and availability of waste data is poor. National construction and demolition waste (CDW) data estimates are not fully disaggregated and the contribution of CMT waste (classified as CDW) to the national total CDW landfill burden is difficult to quantify. The environmental and human impacts of anthropogenic release of hazardous substances contained in CMT waste into the ecosphere can be measured by construing waste indicators from the EMS. An analytical framework for evaluating the EMS is developed to elucidate CMT waste indicators and assess these indicators against the principle of proportionality. Assessing against this principle allows for: objective evaluations of whether the environmental measures prescribed in the EMS are 'proportionate' to the 'desired' (subjective) level of protection chosen by decision-makers; and benchmarking CMT waste indicators against aspirational CDW targets set by each Australian jurisdiction included in the audit. Construed together, the EMS derived waste indicators and benchmark data provide a composite indicator of environmental performance and progress towards SD. The key audit findings indicate: CMT laboratories have a 'poor' environmental performance (and overall progress towards SD) when EMS waste data are converted into indicator scores and assessed against the principle of proportionality; CMT waste recycling targets are lower when benchmarked against jurisdictional CDW waste recovery targets; and no significant difference in the average quantity of waste diversion away from landfill was observed for laboratories with ISO14001 EMS certification compared to non-ISO14001 certified laboratories. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E
2015-09-29
In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Chemotherapy Extravasation: Establishing a National Benchmark for Incidence Among Cancer Centers.
Jackson-Rose, Jeannette; Del Monte, Judith; Groman, Adrienne; Dial, Linda S; Atwell, Leah; Graham, Judy; O'Neil Semler, Rosemary; O'Sullivan, Maryellen; Truini-Pittman, Lisa; Cunningham, Terri A; Roman-Fischetti, Lisa; Costantinou, Eileen; Rimkus, Chris; Banavage, Adrienne J; Dietz, Barbara; Colussi, Carol J; Catania, Kimberly; Wasko, Michelle; Schreffler, Kevin A; West, Colleen; Siefert, Mary Lou; Rice, Robert David
2017-08-01
Given the high-risk nature and nurse sensitivity of chemotherapy infusion and extravasation prevention, as well as the absence of an industry benchmark, a group of nurses studied oncology-specific nursing-sensitive indicators. . The purpose was to establish a benchmark for the incidence of chemotherapy extravasation with vesicants, irritants, and irritants with vesicant potential. . Infusions with actual or suspected extravasations of vesicant and irritant chemotherapies were evaluated. Extravasation events were reviewed by type of agent, occurrence by drug category, route of administration, level of harm, follow-up, and patient referrals to surgical consultation. . A total of 739,812 infusions were evaluated, with 673 extravasation events identified. Incidence for all extravasation events was 0.09%.
Land Ice Verification and Validation Kit
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-07-15
To address a pressing need to better understand the behavior and complex interaction of ice sheets within the global Earth system, significant development of continental-scale, dynamical ice-sheet models is underway. The associated verification and validation process of these models is being coordinated through a new, robust, python-based extensible software package, the Land Ice Verification and Validation toolkit (LIVV). This release provides robust and automated verification and a performance evaluation on LCF platforms. The performance V&V involves a comprehensive comparison of model performance relative to expected behavior on a given computing platform. LIVV operates on a set of benchmark and testmore » data, and provides comparisons for a suite of community prioritized tests, including configuration and parameter variations, bit-4-bit evaluation, and plots of tests where differences occur.« less
Long, Brandon R.; Rinaldo, Steven G.; Gallagher, Kevin G.; ...
2016-11-09
Coin-cells are often the test format of choice for laboratories engaged in battery research and development as they provide a convenient platform for rapid testing of new materials on a small scale. However, reliable, reproducible data via the coin-cell format is inherently difficult, particularly in the full-cell configuration. In addition, statistical evaluation to prove the consistency and reliability of such data is often neglected. Herein we report on several studies aimed at formalizing physical process parameters and coin-cell construction related to full cells. Statistical analysis and performance benchmarking approaches are advocated as a means to more confidently track changes inmore » cell performance. Finally, we show that trends in the electrochemical data obtained from coin-cells can be reliable and informative when standardized approaches are implemented in a consistent manner.« less
Fuzzy Sarsa with Focussed Replacing Eligibility Traces for Robust and Accurate Control
NASA Astrophysics Data System (ADS)
Kamdem, Sylvain; Ohki, Hidehiro; Sueda, Naomichi
Several methods of reinforcement learning in continuous state and action spaces that utilize fuzzy logic have been proposed in recent years. This paper introduces Fuzzy Sarsa(λ), an on-policy algorithm for fuzzy learning that relies on a novel way of computing replacing eligibility traces to accelerate the policy evaluation. It is tested against several temporal difference learning algorithms: Sarsa(λ), Fuzzy Q(λ), an earlier fuzzy version of Sarsa and an actor-critic algorithm. We perform detailed evaluations on two benchmark problems : a maze domain and the cart pole. Results of various tests highlight the strengths and weaknesses of these algorithms and show that Fuzzy Sarsa(λ) outperforms all other algorithms tested for a larger granularity of design and under noisy conditions. It is a highly competitive method of learning in realistic noisy domains where a denser fuzzy design over the state space is needed for a more precise control.
Open Rotor - Analysis of Diagnostic Data
NASA Technical Reports Server (NTRS)
Envia, Edmane
2011-01-01
NASA is researching open rotor propulsion as part of its technology research and development plan for addressing the subsonic transport aircraft noise, emission and fuel burn goals. The low-speed wind tunnel test for investigating the aerodynamic and acoustic performance of a benchmark blade set at the approach and takeoff conditions has recently concluded. A high-speed wind tunnel diagnostic test campaign has begun to investigate the performance of this benchmark open rotor blade set at the cruise condition. Databases from both speed regimes will comprise a comprehensive collection of benchmark open rotor data for use in assessing/validating aerodynamic and noise prediction tools (component & system level) as well as providing insights into the physics of open rotors to help guide the development of quieter open rotors.
JASMIN: Japanese-American study of muon interactions and neutron detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakashima, Hiroshi; /JAEA, Ibaraki; Mokhov, N.V.
Experimental studies of shielding and radiation effects at Fermi National Accelerator Laboratory (FNAL) have been carried out under collaboration between FNAL and Japan, aiming at benchmarking of simulation codes and study of irradiation effects for upgrade and design of new high-energy accelerator facilities. The purposes of this collaboration are (1) acquisition of shielding data in a proton beam energy domain above 100GeV; (2) further evaluation of predictive accuracy of the PHITS and MARS codes; (3) modification of physics models and data in these codes if needed; (4) establishment of irradiation field for radiation effect tests; and (5) development of amore » code module for improved description of radiation effects. A series of experiments has been performed at the Pbar target station and NuMI facility, using irradiation of targets with 120 GeV protons for antiproton and neutrino production, as well as the M-test beam line (M-test) for measuring nuclear data and detector responses. Various nuclear and shielding data have been measured by activation methods with chemical separation techniques as well as by other detectors such as a Bonner ball counter. Analyses with the experimental data are in progress for benchmarking the PHITS and MARS15 codes. In this presentation recent activities and results are reviewed.« less
Issues in Benchmarking Human Reliability Analysis Methods: A Literature Review
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ronald L. Boring; Stacey M. L. Hendrickson; John A. Forester
There is a diversity of human reliability analysis (HRA) methods available for use in assessing human performance within probabilistic risk assessments (PRA). Due to the significant differences in the methods, including the scope, approach, and underlying models, there is a need for an empirical comparison investigating the validity and reliability of the methods. To accomplish this empirical comparison, a benchmarking study comparing and evaluating HRA methods in assessing operator performance in simulator experiments is currently underway. In order to account for as many effects as possible in the construction of this benchmarking study, a literature review was conducted, reviewing pastmore » benchmarking studies in the areas of psychology and risk assessment. A number of lessons learned through these studies are presented in order to aid in the design of future HRA benchmarking endeavors.« less
Attacks, applications, and evaluation of known watermarking algorithms with Checkmark
NASA Astrophysics Data System (ADS)
Meerwald, Peter; Pereira, Shelby
2002-04-01
The Checkmark benchmarking tool was introduced to provide a framework for application-oriented evaluation of watermarking schemes. In this article we introduce new attacks and applications into the existing Checkmark framework. In addition to describing new attacks and applications, we also compare the performance of some well-known watermarking algorithms (proposed by Bruyndonckx,Cox, Fridrich, Dugad, Kim, Wang, Xia, Xie, Zhu and Pereira) with respect to the Checkmark benchmark. In particular, we consider the non-geometric application which contains tests that do not change the geometry of image. This attack constraint is artificial, but yet important for research purposes since a number of algorithms may be interesting, but would score poorly with respect to specific applications simply because geometric compensation has not been incorporated. We note, however, that with the help of image registration, even research algorithms that do not have counter-measures against geometric distortion -- such as a template or reference watermark -- can be evaluated. In the first version of the Checkmark benchmarking program, application-oriented evaluation was introduced, along with many new attacks not already considered in the literature. A second goal of this paper is to introduce new attacks and new applications into the Checkmark framework. In particular, we introduce the following new applications: video frame watermarking, medical imaging and watermarking of logos. Video frame watermarking includes low compression attacks and distortions which warp the edges of the video as well as general projective transformations which may result from someone filming the screen at a cinema. With respect to medical imaging, only small distortions are considered and furthermore it is essential that no distortions are present at embedding. Finally for logos, we consider images of small sizes and particularly compression, scaling, aspect ratio and other small distortions. The challenge of watermarking logos is essentially that of watermarking a small and typically simple image. With respect to new attacks, we consider: subsampling followed by interpolation, dithering and thresholding which both yield a binary image.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marck, Steven C. van der, E-mail: vandermarck@nrg.eu
Recent releases of three major world nuclear reaction data libraries, ENDF/B-VII.1, JENDL-4.0, and JEFF-3.1.1, have been tested extensively using benchmark calculations. The calculations were performed with the latest release of the continuous energy Monte Carlo neutronics code MCNP, i.e. MCNP6. Three types of benchmarks were used, viz. criticality safety benchmarks, (fusion) shielding benchmarks, and reference systems for which the effective delayed neutron fraction is reported. For criticality safety, more than 2000 benchmarks from the International Handbook of Criticality Safety Benchmark Experiments were used. Benchmarks from all categories were used, ranging from low-enriched uranium, compound fuel, thermal spectrum ones (LEU-COMP-THERM), tomore » mixed uranium-plutonium, metallic fuel, fast spectrum ones (MIX-MET-FAST). For fusion shielding many benchmarks were based on IAEA specifications for the Oktavian experiments (for Al, Co, Cr, Cu, LiF, Mn, Mo, Si, Ti, W, Zr), Fusion Neutronics Source in Japan (for Be, C, N, O, Fe, Pb), and Pulsed Sphere experiments at Lawrence Livermore National Laboratory (for {sup 6}Li, {sup 7}Li, Be, C, N, O, Mg, Al, Ti, Fe, Pb, D2O, H2O, concrete, polyethylene and teflon). The new functionality in MCNP6 to calculate the effective delayed neutron fraction was tested by comparison with more than thirty measurements in widely varying systems. Among these were measurements in the Tank Critical Assembly (TCA in Japan) and IPEN/MB-01 (Brazil), both with a thermal spectrum, two cores in Masurca (France) and three cores in the Fast Critical Assembly (FCA, Japan), all with fast spectra. The performance of the three libraries, in combination with MCNP6, is shown to be good. The results for the LEU-COMP-THERM category are on average very close to the benchmark value. Also for most other categories the results are satisfactory. Deviations from the benchmark values do occur in certain benchmark series, or in isolated cases within benchmark series. Such instances can often be related to nuclear data for specific non-fissile elements, such as C, Fe, or Gd. Indications are that the intermediate and mixed spectrum cases are less well described. The results for the shielding benchmarks are generally good, with very similar results for the three libraries in the majority of cases. Nevertheless there are, in certain cases, strong deviations between calculated and benchmark values, such as for Co and Mg. Also, the results show discrepancies at certain energies or angles for e.g. C, N, O, Mo, and W. The functionality of MCNP6 to calculate the effective delayed neutron fraction yields very good results for all three libraries.« less
Test Cases for Flutter of the Benchmark Models Rectangular Wings on the Pitch and Plunge Apparatus
NASA Technical Reports Server (NTRS)
Bennett, Robert M.
2000-01-01
The supercritical airfoil was chosen as a relatively modem airfoil for comparison. The BOO12 model was tested first. Three different types of flutter instability boundaries were encountered, a classical flutter boundary, a transonic stall flutter boundary at angle of attack, and a plunge instability near M = 0.9 and for zero angle of attack. This test was made in air and was Transonic Dynamics Tunnel (TDT) Test 468. The BSCW model (for Benchmark SuperCritical Wing) was tested next as TDT Test 470. It was tested using both with air and a heavy gas, R-12, as a test medium. The effect of a transition strip on flutter was evaluated in air. The B64AOlO model was subsequently tested as TDT Test 493. Some further analysis of the experimental data for the BOO12 wing is presented. Transonic calculations using the parameters for the BOO12 wing in a two-dimensional typical section flutter analysis are given. These data are supplemented with data from the Benchmark Active Controls Technology model (BACT) given and in the next chapter of this document. The BACT model was of the same planform and airfoil as the BOO12 model, but with spoilers and a trailing edge control. It was tested in the heavy gas R-12, and was instrumented mostly at the 60 per cent span. The flutter data obtained on PAPA and the static aerodynamic test cases from BACT serve as additional data for the BOO12 model. All three types of flutter are included in the BACT Test Cases. In this report several test cases are selected to illustrate trends for a variety of different conditions with emphasis on transonic flutter. Cases are selected for classical and stall flutter for the BSCW model, for classical and plunge for the B64AOlO model, and for classical flutter for the BOO12 model. Test Cases are also presented for BSCW for static angles of attack. Only the mean pressures and the real and imaginary parts of the first harmonic of the pressures are included in the data for the test cases, but digitized time histories have been archived. The data for the test cases are available as separate electronic files. An overview of the model and tests is given, the standard formulary for these data is listed, and some sample results are presented.
A Field-Based Aquatic Life Benchmark for Conductivity in ...
This report adapts the standard U.S. EPA methodology for deriving ambient water quality criteria. Rather than use toxicity test results, the adaptation uses field data to determine the loss of 5% of genera from streams. The method is applied to derive effect benchmarks for dissolved salts as measured by conductivity in Central Appalachian streams using data from West Virginia and Kentucky. This report provides scientific evidence for a conductivity benchmark in a specific region rather than for the entire United States.
Girard, Raphaële; Aupee, Martine; Erb, Martine; Bettinger, Anne; Jouve, Alice
2012-12-01
The 3ml volume currently used as the hand hygiene (HH) measure has been explored as the pertinent dose for an indirect indicator of HH compliance. A multicenter study was conducted in order to ascertain the required dose using different products. The average contact duration before drying was measured and compared with references. Effective hand coverage had to include the whole hand and the wrist. Two durations were chosen as points of reference: 30s, as given by guidelines, and the duration validated by the European standard EN 1500. Each product was to be tested, using standardized procedures, by three nosocomial infection prevention teams, for three different doses (3, 2 and 1.5ml). Data from 27 products and 1706 tests were analyzed. Depending on the product, the dose needed to ensure a 30-s contact duration in 75% of tests ranging from 2ml to more than 3ml, and to ensure a contact duration exceeding the EN 1500 times in 75% of tests ranging from 1.5ml to more than 3ml. The aftermath interpretation is the following: if different products are used, the volume utilized does not give an unbiased estimation of the HH compliance. Other compliance evaluation methods remain necessary for efficient benchmarking. Copyright © 2012 Ministry of Health, Saudi Arabia. Published by Elsevier Ltd. All rights reserved.
Improved image alignment method in application to X-ray images and biological images.
Wang, Ching-Wei; Chen, Hsiang-Chou
2013-08-01
Alignment of medical images is a vital component of a large number of applications throughout the clinical track of events; not only within clinical diagnostic settings, but prominently so in the area of planning, consummation and evaluation of surgical and radiotherapeutical procedures. However, image registration of medical images is challenging because of variations on data appearance, imaging artifacts and complex data deformation problems. Hence, the aim of this study is to develop a robust image alignment method for medical images. An improved image registration method is proposed, and the method is evaluated with two types of medical data, including biological microscopic tissue images and dental X-ray images and compared with five state-of-the-art image registration techniques. The experimental results show that the presented method consistently performs well on both types of medical images, achieving 88.44 and 88.93% averaged registration accuracies for biological tissue images and X-ray images, respectively, and outperforms the benchmark methods. Based on the Tukey's honestly significant difference test and Fisher's least square difference test tests, the presented method performs significantly better than all existing methods (P ≤ 0.001) for tissue image alignment, and for the X-ray image registration, the proposed method performs significantly better than the two benchmark b-spline approaches (P < 0.001). The software implementation of the presented method and the data used in this study are made publicly available for scientific communities to use (http://www-o.ntust.edu.tw/∼cweiwang/ImprovedImageRegistration/). cweiwang@mail.ntust.edu.tw.
XWeB: The XML Warehouse Benchmark
NASA Astrophysics Data System (ADS)
Mahboubi, Hadj; Darmont, Jérôme
With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-19
..., estimates biological benchmarks, projects future population conditions, and recommends research and... the Assessment webinars are as follows: 1. Participants will employ assessment models to evaluate stock status, estimate population benchmarks and management criteria, and project future conditions. The...
Ó Conchúir, Shane; Barlow, Kyle A; Pache, Roland A; Ollikainen, Noah; Kundert, Kale; O'Meara, Matthew J; Smith, Colin A; Kortemme, Tanja
2015-01-01
The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Edwards, Roger A; Dee, Deborah; Umer, Amna; Perrine, Cria G; Shealy, Katherine R; Grummer-Strawn, Laurence M
2014-02-01
A substantial proportion of US maternity care facilities engage in practices that are not evidence-based and that interfere with breastfeeding. The CDC Survey of Maternity Practices in Infant Nutrition and Care (mPINC) showed significant variation in maternity practices among US states. The purpose of this article is to use benchmarking techniques to identify states within relevant peer groups that were top performers on mPINC survey indicators related to breastfeeding support. We used 11 indicators of breastfeeding-related maternity care from the 2011 mPINC survey and benchmarking techniques to organize and compare hospital-based maternity practices across the 50 states and Washington, DC. We created peer categories for benchmarking first by region (grouping states by West, Midwest, South, and Northeast) and then by size (grouping states by the number of maternity facilities and dividing each region into approximately equal halves based on the number of facilities). Thirty-four states had scores high enough to serve as benchmarks, and 32 states had scores low enough to reflect the lowest score gap from the benchmark on at least 1 indicator. No state served as the benchmark on more than 5 indicators and no state was furthest from the benchmark on more than 7 indicators. The small peer group benchmarks in the South, West, and Midwest were better than the large peer group benchmarks on 91%, 82%, and 36% of the indicators, respectively. In the West large, the Midwest large, the Midwest small, and the South large peer groups, 4-6 benchmarks showed that less than 50% of hospitals have ideal practice in all states. The evaluation presents benchmarks for peer group state comparisons that provide potential and feasible targets for improvement.
Barty, Rebecca L; Gagliardi, Kathleen; Owens, Wendy; Lauzon, Deborah; Scheuermann, Sheena; Liu, Yang; Wang, Grace; Pai, Menaka; Heddle, Nancy M
2015-07-01
Benchmarking is a quality improvement tool that compares an organization's performance to that of its peers for selected indicators, to improve practice. Processes to develop evidence-based benchmarks for red blood cell (RBC) outdating in Ontario hospitals, based on RBC hospital disposition data from Canadian Blood Services, have been previously reported. These benchmarks were implemented in 160 hospitals provincewide with a multifaceted approach, which included hospital education, inventory management tools and resources, summaries of best practice recommendations, recognition of high-performing sites, and audit tools on the Transfusion Ontario website (http://transfusionontario.org). In this study we describe the implementation process and the impact of the benchmarking program on RBC outdating. A conceptual framework for continuous quality improvement of a benchmarking program was also developed. The RBC outdating rate for all hospitals trended downward continuously from April 2006 to February 2012, irrespective of hospitals' transfusion rates or their distance from the blood supplier. The highest annual outdating rate was 2.82%, at the beginning of the observation period. Each year brought further reductions, with a nadir outdating rate of 1.02% achieved in 2011. The key elements of the successful benchmarking strategy included dynamic targets, a comprehensive and evidence-based implementation strategy, ongoing information sharing, and a robust data system to track information. The Ontario benchmarking program for RBC outdating resulted in continuous and sustained quality improvement. Our conceptual iterative framework for benchmarking provides a guide for institutions implementing a benchmarking program. © 2015 AABB.
The mass storage testing laboratory at GSFC
NASA Technical Reports Server (NTRS)
Venkataraman, Ravi; Williams, Joel; Michaud, David; Gu, Heng; Kalluri, Atri; Hariharan, P. C.; Kobler, Ben; Behnke, Jeanne; Peavey, Bernard
1998-01-01
Industry-wide benchmarks exist for measuring the performance of processors (SPECmarks), and of database systems (Transaction Processing Council). Despite storage having become the dominant item in computing and IT (Information Technology) budgets, no such common benchmark is available in the mass storage field. Vendors and consultants provide services and tools for capacity planning and sizing, but these do not account for the complete set of metrics needed in today's archives. The availability of automated tape libraries, high-capacity RAID systems, and high- bandwidth interconnectivity between processor and peripherals has led to demands for services which traditional file systems cannot provide. File Storage and Management Systems (FSMS), which began to be marketed in the late 80's, have helped to some extent with large tape libraries, but their use has introduced additional parameters affecting performance. The aim of the Mass Storage Test Laboratory (MSTL) at Goddard Space Flight Center is to develop a test suite that includes not only a comprehensive check list to document a mass storage environment but also benchmark code. Benchmark code is being tested which will provide measurements for both baseline systems, i.e. applications interacting with peripherals through the operating system services, and for combinations involving an FSMS. The benchmarks are written in C, and are easily portable. They are initially being aimed at the UNIX Open Systems world. Measurements are being made using a Sun Ultra 170 Sparc with 256MB memory running Solaris 2.5.1 with the following configuration: 4mm tape stacker on SCSI 2 Fast/Wide; 4GB disk device on SCSI 2 Fast/Wide; and Sony Petaserve on Fast/Wide differential SCSI 2.
Benchmarking the Multidimensional Stellar Implicit Code MUSIC
NASA Astrophysics Data System (ADS)
Goffrey, T.; Pratt, J.; Viallet, M.; Baraffe, I.; Popov, M. V.; Walder, R.; Folini, D.; Geroux, C.; Constantino, T.
2017-04-01
We present the results of a numerical benchmark study for the MUltidimensional Stellar Implicit Code (MUSIC) based on widely applicable two- and three-dimensional compressible hydrodynamics problems relevant to stellar interiors. MUSIC is an implicit large eddy simulation code that uses implicit time integration, implemented as a Jacobian-free Newton Krylov method. A physics based preconditioning technique which can be adjusted to target varying physics is used to improve the performance of the solver. The problems used for this benchmark study include the Rayleigh-Taylor and Kelvin-Helmholtz instabilities, and the decay of the Taylor-Green vortex. Additionally we show a test of hydrostatic equilibrium, in a stellar environment which is dominated by radiative effects. In this setting the flexibility of the preconditioning technique is demonstrated. This work aims to bridge the gap between the hydrodynamic test problems typically used during development of numerical methods and the complex flows of stellar interiors. A series of multidimensional tests were performed and analysed. Each of these test cases was analysed with a simple, scalar diagnostic, with the aim of enabling direct code comparisons. As the tests performed do not have analytic solutions, we verify MUSIC by comparing it to established codes including ATHENA and the PENCIL code. MUSIC is able to both reproduce behaviour from established and widely-used codes as well as results expected from theoretical predictions. This benchmarking study concludes a series of papers describing the development of the MUSIC code and provides confidence in future applications.
Development and efficacy assessments of tea seed oil makeup remover.
Parnsamut, N; Kanlayavattanakul, M; Lourith, N
2017-05-01
The efficacy of tea seed oil to clean foundation and eyeliner was evaluated. The safe and efficient tea seed oil makeup remover was developed. In vitro cleansing efficacy of makeup remover was UV-spectrophotometric validated. The stability evaluation by means of accelerated stability test was conducted. In vitro and in vivo cleansing efficacy of the removers was conducted in a comparison with benchmark majorly containing olive oil. Tea seed oil cleaned 90.64±4.56% of foundation and 87.62±8.35% of eyeliner. The stable with most appropriate textures base was incorporated with tea seed oil. Three tea seed oil removers (50, 55 and 60%) were stabled. The 60% tea seed oil remover significantly removed foundation better than others (94.48±3.37%; P<0.001) and the benchmark (92.32±1.33%), but insignificant removed eyeliner (87.50±5.15%; P=0.059). Tea seed oil remover caused none of skin irritation as examined in 20 human volunteers. A single-blind, randomized control exhibited that the tea seed oil remover gained a better preference over the benchmark (75.42±8.10 and 70.00±7.78%; P=0.974). The safe and efficient tea seed oil makeup removers had been developed. The consumers' choices towards the makeup remover containing the bio-oils are widen. In vitro cleansing efficacy during the course of makeup remover development using UV-spectrophotometric method feasible for pharmaceutic industries is encouraged. Copyright © 2016 Académie Nationale de Pharmacie. Published by Elsevier Masson SAS. All rights reserved.
Nguyen, Hoang C; Langland, Amie L; Amara, John P; Dullen, Michael; Kahn, David S; Costanzo, Joseph A
2018-04-30
Biologic manufacturing processes typically employ clarification technologies like depth filtration to remove insoluble and soluble impurities. Conventional depth filtration media used in these processes contain naturally-derived components like diatomaceous earth and cellulose. These components may introduce performance variability and contribute extractable/leachable components like beta-glucans that could interfere with limulus amebocyte lysate endotoxin assays. Recently a novel, all-synthetic depth filtration media is developed (Millistak+ ® HC Pro X0SP) that may improve process consistency, efficiency, and drug substance product quality by reducing soluble process impurities. This new media is evaluated against commercially available benchmark filters containing naturally-derived components (Millistak+ ® HC X0HC and B1HC). Using model proteins, the synthetic media demonstrates increased binding capacity of positively charged proteins (72-126 mg g -1 media) compared to conventional media (0.3-8.6 mg g -1 media); and similar values for negatively charged species (1.3-5.6 mg g -1 media). Several CHO-derived monoclonal antibodies (mAbs) or mAb-like molecules are also evaluated. The X0SP filtration performance behaves similarly to benchmarks, and exhibits improved HCP reduction (at least 50% in 55% of cases tested). X0SP filtrates contained increased silicon extractables relative to benchmarks, but these were readily removed downstream. Finally, the X0SP devices demonstrates suitable lot-to-lot robustness when specific media components are altered intentionally to manufacturing specification limits. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
MHEC Survey Establishes Midwest Property Insurance Benchmarks.
ERIC Educational Resources Information Center
Midwestern Higher Education Commission Risk Management Institute Research Bulletin, 1994
1994-01-01
This publication presents the results of a survey of over 200 midwestern colleges and universities on their property insurance programs and establishes benchmarks to help these institutions evaluate their insurance programs. Findings included the following: (1) 51 percent of respondents currently purchase their property insurance as part of a…
School-Based Cognitive-Behavioral Therapy for Adolescent Depression: A Benchmarking Study
ERIC Educational Resources Information Center
Shirk, Stephen R.; Kaplinski, Heather; Gudmundsen, Gretchen
2009-01-01
The current study evaluated cognitive-behavioral therapy (CBT) for adolescent depression delivered in health clinics and counseling centers in four high schools. Outcomes were benchmarked to results from prior efficacy trials. Fifty adolescents diagnosed with depressive disorders were treated by eight doctoral-level psychologists who followed a…
A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.
Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin
2015-12-01
Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Mamo, Dereje; Hazel, Elizabeth; Lemma, Israel; Guenther, Tanya; Bekele, Abeba; Demeke, Berhanu
2014-10-01
Program managers require feasible, timely, reliable, and valid measures of iCCM implementation to identify problems and assess progress. The global iCCM Task Force developed benchmark indicators to guide implementers to develop or improve monitoring and evaluation (M&E) systems. To assesses Ethiopia's iCCM M&E system by determining the availability and feasibility of the iCCM benchmark indicators. We conducted a desk review of iCCM policy documents, monitoring tools, survey reports, and other rele- vant documents; and key informant interviews with government and implementing partners involved in iCCM scale-up and M&E. Currently, Ethiopia collects data to inform most (70% [33/47]) iCCM benchmark indicators, and modest extra effort could boost this to 83% (39/47). Eight (17%) are not available given the current system. Most benchmark indicators that track coordination and policy, human resources, service delivery and referral, supervision, and quality assurance are available through the routine monitoring systems or periodic surveys. Indicators for supply chain management are less available due to limited consumption data and a weak link with treatment data. Little information is available on iCCM costs. Benchmark indicators can detail the status of iCCM implementation; however, some indicators may not fit country priorities, and others may be difficult to collect. The government of Ethiopia and partners should review and prioritize the benchmark indicators to determine which should be included in the routine M&E system, especially since iCCMdata are being reviewed for addition to the HMIS. Moreover, the Health Extension Worker's reporting burden can be minimized by an integrated reporting approach.
Buell, G.R.; Grams, S.C.
1985-01-01
Significant temporal trends in monthly pH, specific conductance, total alkalinity, hardness, total nitrite-plus-nitrite nitrogen, and total phosphorus measurements at five stream sites in Georgia were identified using a rank correlation technique, the seasonal Kendall test and slope estimator. These sites include a U.S. Geological Survey Hydrologic Bench-Mark site, Falling Creek near Juliette, and four periodic water-quality monitoring sites. Comparison of raw data trends with streamflow-residual trends and, where applicable, with chemical-discharge trends (instantaneous fluxes) shws that some of these trends are responses to factors other than changing streamflow. Percentages of forested, agricultural, and urban cover with each basin did not change much during the periods of water-quality record, and therefore these non-flow-related trends are not obviously related to changes in land cover or land use. Flow-residual water-quality trends at the Hydrologic Bench-Mark site and at the Chattooga River site probably indicate basin reponses to changes in the chemical quality of atmospheric deposition. These two basins are predominantly forested and have received little recent human use. Observed trends at the other three sites probably indicate basin responses to various land uses and water uses associated with agricultural and urban land or to changes in specific uses. (USGS)
Vreck, D; Gernaey, K V; Rosen, C; Jeppsson, U
2006-01-01
In this paper, implementation of the Benchmark Simulation Model No 2 (BSM2) within Matlab-Simulink is presented. The BSM2 is developed for plant-wide WWTP control strategy evaluation on a long-term basis. It consists of a pre-treatment process, an activated sludge process and sludge treatment processes. Extended evaluation criteria are proposed for plant-wide control strategy assessment. Default open-loop and closed-loop strategies are also proposed to be used as references with which to compare other control strategies. Simulations indicate that the BM2 is an appropriate tool for plant-wide control strategy evaluation.
NASA Astrophysics Data System (ADS)
Watanabe, Yukinobu; Kin, Tadahiro; Araki, Shouhei; Nakayama, Shinsuke; Iwamoto, Osamu
2017-09-01
A comprehensive research program on deuteron nuclear data motivated by development of accelerator-based neutron sources is being executed. It is composed of measurements of neutron and gamma-ray yields and production cross sections, modelling of deuteron-induced reactions and code development, nuclear data evaluation and benchmark test, and its application to medical radioisotopes production. The goal of this program is to develop a state-of-the-art deuteron nuclear data library up to 200 MeV which will be useful for the design of future (d,xn) neutron sources. The current status and future plan are reviewed.
Calculation of the Phenix end-of-life test 'Control Rod Withdrawal' with the ERANOS code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tiberi, V.
2012-07-01
The Inst. of Radiological Protection and Nuclear Safety (IRSN) acts as technical support to French public authorities. As such, IRSN is in charge of safety assessment of operating and under construction reactors, as well as future projects. In this framework, one current objective of IRSN is to evaluate the ability and accuracy of numerical tools to foresee consequences of accidents. Neutronic studies step in the safety assessment from different points of view among which the core design and its protection system. They are necessary to evaluate the core behavior in case of accident in order to assess the integrity ofmore » the first barrier and the absence of a prompt criticality risk. To reach this objective one main physical quantity has to be evaluated accurately: the neutronic power distribution in core during whole reactor lifetime. Phenix end of life tests, carried out in 2009, aim at increasing the experience feedback on sodium cooled fast reactors. These experiments have been done in the framework of the development of the 4. generation of nuclear reactors. Ten tests have been carried out: 6 on neutronic and fuel aspects, 2 on thermal hydraulics and 2 for the emergency shutdown. Two of them have been chosen for an international exercise on thermal hydraulics and neutronics in the frame of an IAEA Coordinated Research Project. Concerning neutronics, the Control Rod Withdrawal test is relevant for safety because it allows evaluating the capability of calculation tools to compute the radial power distribution on fast reactors core configurations in which the flux field is very deformed. IRSN participated to this benchmark with the ERANOS code developed by CEA for fast reactors studies. This paper presents the results obtained in the framework of the benchmark activity. A relatively good agreement was found with available measures considering the approximations done in the modeling. The work underlines the importance of burn-up calculations in order to have a fine core concentrations mesh for the calculation of the power distribution. (authors)« less
Analysis of key technologies for virtual instruments metrology
NASA Astrophysics Data System (ADS)
Liu, Guixiong; Xu, Qingui; Gao, Furong; Guan, Qiuju; Fang, Qiang
2008-12-01
Virtual instruments (VIs) require metrological verification when applied as measuring instruments. Owing to the software-centered architecture, metrological evaluation of VIs includes two aspects: measurement functions and software characteristics. Complexity of software imposes difficulties on metrological testing of VIs. Key approaches and technologies for metrology evaluation of virtual instruments are investigated and analyzed in this paper. The principal issue is evaluation of measurement uncertainty. The nature and regularity of measurement uncertainty caused by software and algorithms can be evaluated by modeling, simulation, analysis, testing and statistics with support of powerful computing capability of PC. Another concern is evaluation of software features like correctness, reliability, stability, security and real-time of VIs. Technologies from software engineering, software testing and computer security domain can be used for these purposes. For example, a variety of black-box testing, white-box testing and modeling approaches can be used to evaluate the reliability of modules, components, applications and the whole VI software. The security of a VI can be assessed by methods like vulnerability scanning and penetration analysis. In order to facilitate metrology institutions to perform metrological verification of VIs efficiently, an automatic metrological tool for the above validation is essential. Based on technologies of numerical simulation, software testing and system benchmarking, a framework for the automatic tool is proposed in this paper. Investigation on implementation of existing automatic tools that perform calculation of measurement uncertainty, software testing and security assessment demonstrates the feasibility of the automatic framework advanced.
A proposed benchmark problem for cargo nuclear threat monitoring
NASA Astrophysics Data System (ADS)
Wesley Holmes, Thomas; Calderon, Adan; Peeples, Cody R.; Gardner, Robin P.
2011-10-01
There is currently a great deal of technical and political effort focused on reducing the risk of potential attacks on the United States involving radiological dispersal devices or nuclear weapons. This paper proposes a benchmark problem for gamma-ray and X-ray cargo monitoring with results calculated using MCNP5, v1.51. The primary goal is to provide a benchmark problem that will allow researchers in this area to evaluate Monte Carlo models for both speed and accuracy in both forward and inverse calculational codes and approaches for nuclear security applications. A previous benchmark problem was developed by one of the authors (RPG) for two similar oil well logging problems (Gardner and Verghese, 1991, [1]). One of those benchmarks has recently been used by at least two researchers in the nuclear threat area to evaluate the speed and accuracy of Monte Carlo codes combined with variance reduction techniques. This apparent need has prompted us to design this benchmark problem specifically for the nuclear threat researcher. This benchmark consists of conceptual design and preliminary calculational results using gamma-ray interactions on a system containing three thicknesses of three different shielding materials. A point source is placed inside the three materials lead, aluminum, and plywood. The first two materials are in right circular cylindrical form while the third is a cube. The entire system rests on a sufficiently thick lead base so as to reduce undesired scattering events. The configuration was arranged in such a manner that as gamma-ray moves from the source outward it first passes through the lead circular cylinder, then the aluminum circular cylinder, and finally the wooden cube before reaching the detector. A 2 in.×4 in.×16 in. box style NaI (Tl) detector was placed 1 m from the point source located in the center with the 4 in.×16 in. side facing the system. The two sources used in the benchmark are 137Cs and 235U.
Benchmarking on Tsunami Currents with ComMIT
NASA Astrophysics Data System (ADS)
Sharghi vand, N.; Kanoglu, U.
2015-12-01
There were no standards for the validation and verification of tsunami numerical models before 2004 Indian Ocean tsunami. Even, number of numerical models has been used for inundation mapping effort, evaluation of critical structures, etc. without validation and verification. After 2004, NOAA Center for Tsunami Research (NCTR) established standards for the validation and verification of tsunami numerical models (Synolakis et al. 2008 Pure Appl. Geophys. 165, 2197-2228), which will be used evaluation of critical structures such as nuclear power plants against tsunami attack. NCTR presented analytical, experimental and field benchmark problems aimed to estimate maximum runup and accepted widely by the community. Recently, benchmark problems were suggested by the US National Tsunami Hazard Mitigation Program Mapping & Modeling Benchmarking Workshop: Tsunami Currents on February 9-10, 2015 at Portland, Oregon, USA (http://nws.weather.gov/nthmp/index.html). These benchmark problems concentrated toward validation and verification of tsunami numerical models on tsunami currents. Three of the benchmark problems were: current measurement of the Japan 2011 tsunami in Hilo Harbor, Hawaii, USA and in Tauranga Harbor, New Zealand, and single long-period wave propagating onto a small-scale experimental model of the town of Seaside, Oregon, USA. These benchmark problems were implemented in the Community Modeling Interface for Tsunamis (ComMIT) (Titov et al. 2011 Pure Appl. Geophys. 168, 2121-2131), which is a user-friendly interface to the validated and verified Method of Splitting Tsunami (MOST) (Titov and Synolakis 1995 J. Waterw. Port Coastal Ocean Eng. 121, 308-316) model and is developed by NCTR. The modeling results are compared with the required benchmark data, providing good agreements and results are discussed. Acknowledgment: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 603839 (Project ASTARTE - Assessment, Strategy and Risk Reduction for Tsunamis in Europe)
Phase field benchmark problems for dendritic growth and linear elasticity
Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.; ...
2018-03-26
We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Phase field benchmark problems for dendritic growth and linear elasticity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jokisaari, Andrea M.; Voorhees, P. W.; Guyer, Jonathan E.
We present the second set of benchmark problems for phase field models that are being jointly developed by the Center for Hierarchical Materials Design (CHiMaD) and the National Institute of Standards and Technology (NIST) along with input from other members in the phase field community. As the integrated computational materials engineering (ICME) approach to materials design has gained traction, there is an increasing need for quantitative phase field results. New algorithms and numerical implementations increase computational capabilities, necessitating standard problems to evaluate their impact on simulated microstructure evolution as well as their computational performance. We propose one benchmark problem formore » solidifiication and dendritic growth in a single-component system, and one problem for linear elasticity via the shape evolution of an elastically constrained precipitate. We demonstrate the utility and sensitivity of the benchmark problems by comparing the results of 1) dendritic growth simulations performed with different time integrators and 2) elastically constrained precipitate simulations with different precipitate sizes, initial conditions, and elastic moduli. As a result, these numerical benchmark problems will provide a consistent basis for evaluating different algorithms, both existing and those to be developed in the future, for accuracy and computational efficiency when applied to simulate physics often incorporated in phase field models.« less
Development and validation of phytotoxicity tests with emergent and submerged aquatic plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hughes, J.S.; Powell, R.L.; Nelson, M.K.
1995-12-31
Toxicity testing procedures have recently been developed for assessment of contaminant effects on emergent and submerged aquatic macrophytes commonly found in freshwater wetlands. These tests have potential application in risk assessments for contaminated wetlands as well as for new chemical substances. The objective of this study was to evaluate and modify, if necessary, these methods and to validate them, using two benchmark chemicals, in a contract laboratory setting. Oryza sativa (domestic rice) was used as a surrogate emergent vascular plant, while Ceratophylium demersum (coontail) and Myriophyllum heterophyllum (variable-leaf milfoil) were the representative submerged vascular plants. Subsequent to evaluating culturing techniquesmore » and testing conditions, toxicity tests were conducted using boron and metribuzin. The test procedure for the emergent plants involves a two-week pro-exposure period followed by a two-week aqueous exposure. Five types of sediment, including both natural and artificial sediments, were evaluated for use with rice. Fresh weight and chlorophyll a content were the selected test endpoints. The submerged plants were exposed for two weeks, and the response variables evaluated included length, weight (fresh and dry), and root number. The sensitivity of these tests were comparable to the results obtained for the same two chemicals using the green alga, Selenastrum capricornutum, and the duckweed, Lemna gibba, with the exception that rice was less sensitive to metribuzin than the other species.« less
Integral Full Core Multi-Physics PWR Benchmark with Measured Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Forget, Benoit; Smith, Kord; Kumar, Shikhar
In recent years, the importance of modeling and simulation has been highlighted extensively in the DOE research portfolio with concrete examples in nuclear engineering with the CASL and NEAMS programs. These research efforts and similar efforts worldwide aim at the development of high-fidelity multi-physics analysis tools for the simulation of current and next-generation nuclear power reactors. Like all analysis tools, verification and validation is essential to guarantee proper functioning of the software and methods employed. The current approach relies mainly on the validation of single physic phenomena (e.g. critical experiment, flow loops, etc.) and there is a lack of relevantmore » multiphysics benchmark measurements that are necessary to validate high-fidelity methods being developed today. This work introduces a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading and re-loading patterns. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from 58 instrumented assemblies. The benchmark description is now available online and has been used by many groups. However, much work remains to be done on the quantification of uncertainties and modeling sensitivities. This work aims to address these deficiencies and make this benchmark a true non-proprietary international benchmark for the validation of high-fidelity tools. This report details the BEAVRS uncertainty quantification for the first two cycle of operations and serves as the final report of the project.« less
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.
Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers
Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio
2018-01-01
The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeHart, Mark D.; Mausolff, Zander; Weems, Zach
2016-08-01
One goal of the MAMMOTH M&S project is to validate the analysis capabilities within MAMMOTH. Historical data has shown limited value for validation of full three-dimensional (3D) multi-physics methods. Initial analysis considered the TREAT startup minimum critical core and one of the startup transient tests. At present, validation is focusing on measurements taken during the M8CAL test calibration series. These exercises will valuable in preliminary assessment of the ability of MAMMOTH to perform coupled multi-physics calculations; calculations performed to date are being used to validate the neutron transport solver Rattlesnake\\cite{Rattlesnake} and the fuels performance code BISON. Other validation projects outsidemore » of TREAT are available for single-physics benchmarking. Because the transient solution capability of Rattlesnake is one of the key attributes that makes it unique for TREAT transient simulations, validation of the transient solution of Rattlesnake using other time dependent kinetics benchmarks has considerable value. The Nuclear Energy Agency (NEA) of the Organization for Economic Cooperation and Development (OECD) has recently developed a computational benchmark for transient simulations. This benchmark considered both two-dimensional (2D) and 3D configurations for a total number of 26 different transients. All are negative reactivity insertions, typically returning to the critical state after some time.« less
Simon, Melissa A; Tom, Laura S; Nonzee, Narissa J; Murphy, Kara R; Endress, Richard; Dong, XinQi; Feinglass, Joe
2015-05-01
The DuPage Patient Navigation Collaborative evaluated the Patient Navigation Research Program (PNRP) model for uninsured women receiving free breast or cervical cancer screening through the Illinois Breast and Cervical Cancer Program in DuPage County, Illinois. We used medical records review and patient surveys of 477 women to compare median follow-up times with external Illinois Breast and Cervical Cancer Program and Chicago PNRP benchmarks of performance. We examined the extent to which we mitigated community-defined timeliness risk factors for delayed follow-up, with a focus on Spanish-speaking participants. Median follow-up time (29.0 days for breast and 56.5 days for cervical screening abnormalities) compared favorably to external benchmarks. Spanish-speaking patients had lower health literacy, lower patient activation, and more health care system distrust than did English-speaking patients, but despite the prevalence of timeliness risk factors, we observed no differences in likelihood of delayed (> 60 days) follow-up by language. Our successful replication and scaling of the PNRP navigation model to DuPage County illustrates a promising approach for future navigator research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nordborg, C.
A new improved version of the OECD Nuclear Energy Agency (NEA) co-ordinated Joint Evaluated Fission and Fusion (JEFF) data library, JEFF-3.1, was released in May 2005. It comprises a general purpose library and the following five special purpose libraries: activation; thermal scattering law; radioactive decay; fission yield; and proton library. The objective of the previous version of the library (JEFF-2.2) was to achieve improved performance for existing reactors and fuel cycles. In addition to this objective, the JEFF-3.1 library aims to provide users with data for a wider range of applications. These include innovative reactor concepts, transmutation of radioactive waste,more » fusion, and various other energy and non-energy related industrial applications. Initial benchmark testing has confirmed the expected very good performance of the JEFF-3.1 library. Additional benchmarking of the libraries is underway, both for the general purpose and for the special purpose libraries. A new three-year mandate to continue developing the JEFF library was recently granted by the NEA. For the next version of the library, JEFF-3.2, it is foreseen to put more effort into fission product and minor actinide evaluations, as well as the inclusion of more covariance data. (authors)« less
Tom, Laura S.; Nonzee, Narissa J.; Murphy, Kara R.; Endress, Richard; Dong, XinQi; Feinglass, Joe
2015-01-01
Objectives. The DuPage Patient Navigation Collaborative evaluated the Patient Navigation Research Program (PNRP) model for uninsured women receiving free breast or cervical cancer screening through the Illinois Breast and Cervical Cancer Program in DuPage County, Illinois. Methods. We used medical records review and patient surveys of 477 women to compare median follow-up times with external Illinois Breast and Cervical Cancer Program and Chicago PNRP benchmarks of performance. We examined the extent to which we mitigated community-defined timeliness risk factors for delayed follow-up, with a focus on Spanish-speaking participants. Results. Median follow-up time (29.0 days for breast and 56.5 days for cervical screening abnormalities) compared favorably to external benchmarks. Spanish-speaking patients had lower health literacy, lower patient activation, and more health care system distrust than did English-speaking patients, but despite the prevalence of timeliness risk factors, we observed no differences in likelihood of delayed (> 60 days) follow-up by language. Conclusions. Our successful replication and scaling of the PNRP navigation model to DuPage County illustrates a promising approach for future navigator research. PMID:25713942
Risthaus, Tobias; Grimme, Stefan
2013-03-12
A new test set (S12L) containing 12 supramolecular noncovalently bound complexes is presented and used to evaluate seven different methods to account for dispersion in DFT (DFT-D3, DFT-D2, DFT-NL, XDM, dDsC, TS-vdW, M06-L) at different basis set levels against experimental, back-corrected reference energies. This allows conclusions about the performance of each method in an explorative research setting on "real-life" problems. Most DFT methods show satisfactory performance but, due to the largeness of the complexes, almost always require an explicit correction for the nonadditive Axilrod-Teller-Muto three-body dispersion interaction to get accurate results. The necessity of using a method capable of accounting for dispersion is clearly demonstrated in that the two-body dispersion contributions are on the order of 20-150% of the total interaction energy. MP2 and some variants thereof are shown to be insufficient for this while a few tested D3-corrected semiempirical MO methods perform reasonably well. Overall, we suggest the use of this benchmark set as a "sanity check" against overfitting to too small molecular cases.
SNOMED CT module-driven clinical archetype management.
Allones, J L; Taboada, M; Martinez, D; Lozano, R; Sobrido, M J
2013-06-01
To explore semantic search to improve management and user navigation in clinical archetype repositories. In order to support semantic searches across archetypes, an automated method based on SNOMED CT modularization is implemented to transform clinical archetypes into SNOMED CT extracts. Concurrently, query terms are converted into SNOMED CT concepts using the search engine Lucene. Retrieval is then carried out by matching query concepts with the corresponding SNOMED CT segments. A test collection of the 16 clinical archetypes, including over 250 terms, and a subset of 55 clinical terms from two medical dictionaries, MediLexicon and MedlinePlus, were used to test our method. The keyword-based service supported by the OpenEHR repository offered us a benchmark to evaluate the enhancement of performance. In total, our approach reached 97.4% precision and 69.1% recall, providing a substantial improvement of recall (more than 70%) compared to the benchmark. Exploiting medical domain knowledge from ontologies such as SNOMED CT may overcome some limitations of the keyword-based systems and thus improve the search experience of repository users. An automated approach based on ontology segmentation is an efficient and feasible way for supporting modeling, management and user navigation in clinical archetype repositories. Copyright © 2013 Elsevier Inc. All rights reserved.
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
NASA Astrophysics Data System (ADS)
Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin
2015-01-01
A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
Walters, Russel M; Gandolfi, Lisa; Mack, M Catherine; Fevola, Michael; Martin, Katharine; Hamilton, Mathew T; Hilberer, Allison; Barnes, Nicole; Wilt, Nathan; Nash, Jennifer R; Raabe, Hans A; Costin, Gertrude-Emilia
2016-12-01
The personal care industry is focused on developing safe, more efficacious, and increasingly milder products, that are routinely undergoing preclinical and clinical testing before becoming available for consumer use on skin. In vitro systems based on skin reconstructed equivalents are now established for the preclinical assessment of product irritation potential and as alternative testing methods to the classic Draize rabbit skin irritation test. We have used the 3-D EpiDerm™ model system to evaluate tissue viability and primary cytokine interleukin-1α release as a way to evaluate the potential dermal irritation of 224 non-ionic, amphoteric and/or anionic surfactant-containing formulations, or individual raw materials. As part of our testing programme, two representative benchmark materials with known clinical skin irritation potential were qualified through repeated testing, for use as references for the skin irritation evaluation of formulations containing new surfactant ingredients. We have established a correlation between the in vitro screening approach and clinical testing, and are continually expanding our database to enhance this correlation. This testing programme integrates the efforts of global manufacturers of personal care products that focus on the development of increasingly milder formulations to be applied to the skin, without the use of animal testing. 2016 FRAME.
Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.
Kerepesi, Csaba; Grolmusz, Vince
2016-05-01
DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.
A chemical EOR benchmark study of different reservoir simulators
NASA Astrophysics Data System (ADS)
Goudarzi, Ali; Delshad, Mojdeh; Sepehrnoori, Kamy
2016-09-01
Interest in chemical EOR processes has intensified in recent years due to the advancements in chemical formulations and injection techniques. Injecting Polymer (P), surfactant/polymer (SP), and alkaline/surfactant/polymer (ASP) are techniques for improving sweep and displacement efficiencies with the aim of improving oil production in both secondary and tertiary floods. There has been great interest in chemical flooding recently for different challenging situations. These include high temperature reservoirs, formations with extreme salinity and hardness, naturally fractured carbonates, and sandstone reservoirs with heavy and viscous crude oils. More oil reservoirs are reaching maturity where secondary polymer floods and tertiary surfactant methods have become increasingly important. This significance has added to the industry's interest in using reservoir simulators as tools for reservoir evaluation and management to minimize costs and increase the process efficiency. Reservoir simulators with special features are needed to represent coupled chemical and physical processes present in chemical EOR processes. The simulators need to be first validated against well controlled lab and pilot scale experiments to reliably predict the full field implementations. The available data from laboratory scale include 1) phase behavior and rheological data; and 2) results of secondary and tertiary coreflood experiments for P, SP, and ASP floods under reservoir conditions, i.e. chemical retentions, pressure drop, and oil recovery. Data collected from corefloods are used as benchmark tests comparing numerical reservoir simulators with chemical EOR modeling capabilities such as STARS of CMG, ECLIPSE-100 of Schlumberger, REVEAL of Petroleum Experts. The research UTCHEM simulator from The University of Texas at Austin is also included since it has been the benchmark for chemical flooding simulation for over 25 years. The results of this benchmark comparison will be utilized to improve chemical design for field-scale studies using commercial simulators. The benchmark tests illustrate the potential of commercial simulators for chemical flooding projects and provide a comprehensive table of strengths and limitations of each simulator for a given chemical EOR process. Mechanistic simulations of chemical EOR processes will provide predictive capability and can aid in optimization of the field injection projects. The objective of this paper is not to compare the computational efficiency and solution algorithms; it only focuses on the process modeling comparison.
Kinoshita, Manabu; Taniguchi, Mai; Takagaki, Masatoshi; Seike, Nobuhisa; Hashimoto, Naoya; Yoshimine, Toshiki
2015-05-01
Neurosurgical patties are the most frequently used instruments during neurosurgical procedures, and their high performance is required to ensure safe operations. They must offer cushioning, water-absorbing, water-retaining, and non-tissue adherent characteristics. Here, the authors describe a revised neurosurgical patty that is superior in all respects to the conventional patty available in Japan. Patty characteristics were critically and scientifically evaluated using various in vitro assays. Moreover, a novel ex vivo evaluation system focusing on the adherent characteristics of the neurosurgical patty was developed. The proposed assay could provide benchmark data for comparing different neurosurgical patties, offering neurosurgeons objective data on the performance of patties. The newly developed patty was also evaluated in real neurosurgical settings and showed superb performance during various neurosurgical procedures.
Developing integrated benchmarks for DOE performance measurement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barancik, J.I.; Kramer, C.F.; Thode, Jr. H.C.
1992-09-30
The objectives of this task were to describe and evaluate selected existing sources of information on occupational safety and health with emphasis on hazard and exposure assessment, abatement, training, reporting, and control identifying for exposure and outcome in preparation for developing DOE performance benchmarks. Existing resources and methodologies were assessed for their potential use as practical performance benchmarks. Strengths and limitations of current data resources were identified. Guidelines were outlined for developing new or improved performance factors, which then could become the basis for selecting performance benchmarks. Data bases for non-DOE comparison populations were identified so that DOE performance couldmore » be assessed relative to non-DOE occupational and industrial groups. Systems approaches were described which can be used to link hazards and exposure, event occurrence, and adverse outcome factors, as needed to generate valid, reliable, and predictive performance benchmarks. Data bases were identified which contain information relevant to one or more performance assessment categories . A list of 72 potential performance benchmarks was prepared to illustrate the kinds of information that can be produced through a benchmark development program. Current information resources which may be used to develop potential performance benchmarks are limited. There is need to develop an occupational safety and health information and data system in DOE, which is capable of incorporating demonstrated and documented performance benchmarks prior to, or concurrent with the development of hardware and software. A key to the success of this systems approach is rigorous development and demonstration of performance benchmark equivalents to users of such data before system hardware and software commitments are institutionalized.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++
NASA Technical Reports Server (NTRS)
Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.
1996-01-01
This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, K.
2016-01-22
The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Sterbentz, James W.; Snoj, Luka
PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
Simulation Studies for Inspection of the Benchmark Test with PATRASH
NASA Astrophysics Data System (ADS)
Shimosaki, Y.; Igarashi, S.; Machida, S.; Shirakata, M.; Takayama, K.; Noda, F.; Shigaki, K.
2002-12-01
In order to delineate the halo-formation mechanisms in a typical FODO lattice, a 2-D simulation code PATRASH (PArticle TRAcking in a Synchrotron for Halo analysis) has been developed. The electric field originating from the space charge is calculated by the Hybrid Tree code method. Benchmark tests utilizing three simulation codes of ACCSIM, PATRASH and SIMPSONS were carried out. These results have been confirmed to be fairly in agreement with each other. The details of PATRASH simulation are discussed with some examples.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Der Marck, S. C.
Three nuclear data libraries have been tested extensively using criticality safety benchmark calculations. The three libraries are the new release of the US library ENDF/B-VII.1 (2011), the new release of the Japanese library JENDL-4.0 (2011), and the OECD/NEA library JEFF-3.1 (2006). All calculations were performed with the continuous-energy Monte Carlo code MCNP (version 4C3, as well as version 6-beta1). Around 2000 benchmark cases from the International Handbook of Criticality Safety Benchmark Experiments (ICSBEP) were used. The results were analyzed per ICSBEP category, and per element. Overall, the three libraries show similar performance on most criticality safety benchmarks. The largest differencesmore » are probably caused by elements such as Be, C, Fe, Zr, W. (authors)« less
Short-Term Field Study Programs: A Holistic and Experiential Approach to Learning
ERIC Educational Resources Information Center
Long, Mary M.; Sandler, Dennis M.; Topol, Martin T.
2017-01-01
For business schools, AACSB and Middle States' call for more experiential learning is one reason to provide study abroad programs. Universities must attend to the demand for continuous improvement and employ metrics to benchmark and evaluate their relative standing among peer institutions. One such benchmark is the National Survey of Student…
ARL Physics Web Pages: An Evaluation by Established, Transitional and Emerging Benchmarks.
ERIC Educational Resources Information Center
Duffy, Jane C.
2002-01-01
Provides an overview of characteristics among Association of Research Libraries (ARL) physics Web pages. Examines current academic Web literature and from that develops six benchmarks to measure physics Web pages: ease of navigation; logic of presentation; representation of all forms of information; engagement of the discipline; interactivity of…
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-24
... evaluates potential datasets and recommends which datasets are appropriate for assessment analyses. The... points to datasets incorporated in the original SEDAR benchmark assessment and run the benchmark... Webinar II November 22, 2010; 10 a.m. - 1 p.m.; SEDAR Update Assessment Webinar III Using updated datasets...
Authentic e-Learning in a Multicultural Context: Virtual Benchmarking Cases from Five Countries
ERIC Educational Resources Information Center
Leppisaari, Irja; Herrington, Jan; Vainio, Leena; Im, Yeonwook
2013-01-01
The implementation of authentic learning elements at education institutions in five countries, eight online courses in total, is examined in this paper. The International Virtual Benchmarking Project (2009-2010) applied the elements of authentic learning developed by Herrington and Oliver (2000) as criteria to evaluate authenticity. Twelve…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Disney, R.K.
1994-10-01
The methodology for handling bias and uncertainty when calculational methods are used in criticality safety evaluations (CSE`s) is a rapidly evolving technology. The changes in the methodology are driven by a number of factors. One factor responsible for changes in the methodology for handling bias and uncertainty in CSE`s within the overview of the US Department of Energy (DOE) is a shift in the overview function from a ``site`` perception to a more uniform or ``national`` perception. Other causes for change or improvement in the methodology for handling calculational bias and uncertainty are; (1) an increased demand for benchmark criticalsmore » data to expand the area (range) of applicability of existing data, (2) a demand for new data to supplement existing benchmark criticals data, (3) the increased reliance on (or need for) computational benchmarks which supplement (or replace) experimental measurements in critical assemblies, and (4) an increased demand for benchmark data applicable to the expanded range of conditions and configurations encountered in DOE site restoration and remediation.« less
Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne
2016-01-30
Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
A Better Benchmark Assessment: Multiple-Choice versus Project-Based
ERIC Educational Resources Information Center
Peariso, Jamon F.
2006-01-01
The purpose of this literature review and Ex Post Facto descriptive study was to determine which type of benchmark assessment, multiple-choice or project-based, provides the best indication of general success on the history portion of the CST (California Standards Tests). The result of the study indicates that although the project-based benchmark…
Benchmark testing of DIII-D neutral beam modeling with water flow calorimetry
Rauch, J. M.; Crowley, B. J.; Scoville, J. T.; ...
2016-06-02
Power loading on beamline components in the DIII-D neutral beam system is measured in this paper using water flow calorimetry. The results are used to benchmark beam transport models. Finally, anomalously high heat loads in the magnet region are investigated and a speculative hypothesis as to their origin is presented.
This report adapts the standard U.S. EPA methodology for deriving ambient water quality criteria. Rather than use toxicity test results, the adaptation uses field data to determine the loss of 5% of genera from streams. The method is applied to derive effect benchmarks for disso...
Academic Achievement and Extracurricular School Activities of At-Risk High School Students
ERIC Educational Resources Information Center
Marchetti, Ryan; Wilson, Randal H.; Dunham, Mardis
2016-01-01
This study compared the employment, extracurricular participation, and family structure status of students from low socioeconomic families that achieved state-approved benchmarks on ACT reading and mathematics tests to those that did not achieve the benchmarks. Free and reduced lunch eligibility was used to determine SES. Participants included 211…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suter, G.W., II
1993-01-01
One of the initial stages in ecological risk assessment of hazardous waste sites is the screening of contaminants to determine which, if any, of them are worthy of further consideration; this process is termed contaminant screening. Screening is performed by comparing concentrations in ambient media to benchmark concentrations that are either indicative of a high likelihood of significant effects (upper screening benchmarks) or of a very low likelihood of significant effects (lower screening benchmarks). Exceedance of an upper screening benchmark indicates that the chemical in question is clearly of concern and remedial actions are likely to be needed. Exceedance ofmore » a lower screening benchmark indicates that a contaminant is of concern unless other information indicates that the data are unreliable or the comparison is inappropriate. Chemicals with concentrations below the lower benchmark are not of concern if the ambient data are judged to be adequate. This report presents potential screening benchmarks for protection of aquatic life from contaminants in water. Because there is no guidance for screening benchmarks, a set of alternative benchmarks is presented herein. The alternative benchmarks are based on different conceptual approaches to estimating concentrations causing significant effects. For the upper screening benchmark, there are the acute National Ambient Water Quality Criteria (NAWQC) and the Secondary Acute Values (SAV). The SAV concentrations are values estimated with 80% confidence not to exceed the unknown acute NAWQC for those chemicals with no NAWQC. The alternative chronic benchmarks are the chronic NAWQC, the Secondary Chronic Value (SCV), the lowest chronic values for fish and daphnids, the lowest EC20 for fish and daphnids from chronic toxicity tests, the estimated EC20 for a sensitive species, and the concentration estimated to cause a 20% reduction in the recruit abundance of largemouth bass. It is recommended that ambient chemical concentrations be compared to all of these benchmarks. If NAWQC are exceeded, the chemicals must be contaminants of concern because the NAWQC are applicable or relevant and appropriate requirements (ARARs). If NAWQC are not exceeded, but other benchmarks are, contaminants should be selected on the basis of the number of benchmarks exceeded and the conservatism of the particular benchmark values, as discussed in the text. To the extent that toxicity data are available, this report presents the alternative benchmarks for chemicals that have been detected on the Oak Ridge Reservation. It also presents the data used to calculate the benchmarks and the sources of the data. It compares the benchmarks and discusses their relative conservatism and utility. This report supersedes a prior aquatic benchmarks report (Suter and Mabrey 1994). It adds two new types of benchmarks. It also updates the benchmark values where appropriate, adds some new benchmark values, replaces secondary sources with primary sources, and provides more complete documentation of the sources and derivation of all values.« less
NASA Astrophysics Data System (ADS)
Kokkoris, M.; Dede, S.; Kantre, K.; Lagoyannis, A.; Ntemou, E.; Paneta, V.; Preketes-Sigalas, K.; Provatas, G.; Vlastou, R.; Bogdanović-Radović, I.; Siketić, Z.; Obajdin, N.
2017-08-01
The evaluated proton differential cross sections suitable for the Elastic Backscattering Spectroscopy (EBS) analysis of natSi and 16O, as obtained from SigmaCalc 2.0, have been benchmarked over a wide energy and angular range at two different accelerator laboratories, namely at N.C.S.R. 'Demokritos', Athens, Greece and at Ruđer Bošković Institute (RBI), Zagreb, Croatia, using a variety of high-purity thick targets of known stoichiometry. The results are presented in graphical and tabular forms, while the observed discrepancies, as well as, the limits in accuracy of the benchmarking procedure, along with target related effects, are thoroughly discussed and analysed. In the case of oxygen the agreement between simulated and experimental spectra was generally good, while for silicon serious discrepancies were observed above Ep,lab = 2.5 MeV, suggesting that a further tuning of the appropriate nuclear model parameters in the evaluated differential cross-section datasets is required.
The Paucity Problem: Where Have All the Space Reactor Experiments Gone?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.; Marshall, Margaret A.
2016-10-01
The Handbooks of the International Criticality Safety Benchmark Evaluation Project (ICSBEP) and the International Reactor Physics Experiment Evaluation Project (IRPhEP) together contain a plethora of documented and evaluated experiments essential in the validation of nuclear data, neutronics codes, and modeling of various nuclear systems. Unfortunately, only a minute selection of handbook data (twelve evaluations) are of actual experimental facilities and mockups designed specifically for space nuclear research. There is a paucity problem, such that the multitude of space nuclear experimental activities performed in the past several decades have yet to be recovered and made available in such detail that themore » international community could benefit from these valuable historical research efforts. Those experiments represent extensive investments in infrastructure, expertise, and cost, as well as constitute significantly valuable resources of data supporting past, present, and future research activities. The ICSBEP and IRPhEP were established to identify and verify comprehensive sets of benchmark data; evaluate the data, including quantification of biases and uncertainties; compile the data and calculations in a standardized format; and formally document the effort into a single source of verified benchmark data. See full abstract in attached document.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, C G; Mathews, S
2006-09-07
Current regulatory schemes use generic or industrial sector specific benchmarks to evaluate the quality of industrial stormwater discharges. While benchmarks can be a useful tool for facility stormwater managers in evaluating the quality stormwater runoff, benchmarks typically do not take into account site-specific conditions, such as: soil chemistry, atmospheric deposition, seasonal changes in water source, and upstream land use. Failing to account for these factors may lead to unnecessary costs to trace a source of natural variation, or potentially missing a significant local water quality problem. Site-specific water quality thresholds, established upon the statistical evaluation of historic data take intomore » account these factors, are a better tool for the direct evaluation of runoff quality, and a more cost-effective trigger to investigate anomalous results. Lawrence Livermore National Laboratory (LLNL), a federal facility, established stormwater monitoring programs to comply with the requirements of the industrial stormwater permit and Department of Energy orders, which require the evaluation of the impact of effluent discharges on the environment. LLNL recognized the need to create a tool to evaluate and manage stormwater quality that would allow analysts to identify trends in stormwater quality and recognize anomalous results so that trace-back and corrective actions could be initiated. LLNL created the site-specific water quality threshold tool to better understand the nature of the stormwater influent and effluent, to establish a technical basis for determining when facility operations might be impacting the quality of stormwater discharges, and to provide ''action levels'' to initiate follow-up to analytical results. The threshold criteria were based on a statistical analysis of the historic stormwater monitoring data and a review of relevant water quality objectives.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alan Black; Arnis Judzis
2003-01-01
Progress during current reporting year 2002 by quarter--Progress during Q1 2002: (1) In accordance to Task 7.0 (D. No.2 Technical Publications) TerraTek, NETL, and the Industry Contributors successfully presented a paper detailing Phase 1 testing results at the February 2002 IADC/SPE Drilling Conference, a prestigious venue for presenting DOE and private sector drilling technology advances. The full reference is as follows: IADC/SPE 74540 ''World's First Benchmarking of Drilling Mud Hammer Performance at Depth Conditions'' authored by Gordon A. Tibbitts, TerraTek; Roy C. Long, US Department of Energy, Brian E. Miller, BP America, Inc.; Arnis Judzis, TerraTek; and Alan D. Black,more » TerraTek. Gordon Tibbitts, TerraTek, will presented the well-attended paper in February of 2002. The full text of the Mud Hammer paper was included in the last quarterly report. (2) The Phase 2 project planning meeting (Task 6) was held at ExxonMobil's Houston Greenspoint offices on February 22, 2002. In attendance were representatives from TerraTek, DOE, BP, ExxonMobil, PDVSA, Novatek, and SDS Digger Tools. (3) PDVSA has joined the advisory board to this DOE mud hammer project. PDVSA's commitment of cash and in-kind contributions were reported during the last quarter. (4) Strong Industry support remains for the DOE project. Both Andergauge and Smith Tools have expressed an interest in participating in the ''optimization'' phase of the program. The potential for increased testing with additional Industry cash support was discussed at the planning meeting in February 2002. Progress during Q2 2002: (1) Presentation material was provided to the DOE/NETL project manager (Dr. John Rogers) for the DOE exhibit at the 2002 Offshore Technology Conference. (2) Two meeting at Smith International and one at Andergauge in Houston were held to investigate their interest in joining the Mud Hammer Performance study. (3) SDS Digger Tools (Task 3 Benchmarking participant) apparently has not negotiated a commercial deal with Halliburton on the supply of fluid hammers to the oil and gas business. (4) TerraTek is awaiting progress by Novatek (a DOE contractor) on the redesign and development of their next hammer tool. Their delay will require an extension to TerraTek's contracted program. (5) Smith International has sufficient interest in the program to start engineering and chroming of collars for testing at TerraTek. (6) Shell's Brian Tarr has agreed to join the Industry Advisory Group for the DOE project. The addition of Brian Tarr is welcomed as he has numerous years of experience with the Novatek tool and was involved in the early tests in Europe while with Mobil Oil. (7) Conoco's field trial of the Smith fluid hammer for an application in Vietnam was organized and has contributed to the increased interest in their tool. Progress during Q3 2002: (1) Smith International agreed to participate in the DOE Mud Hammer program. (2) Smith International chromed collars for upcoming benchmark tests at TerraTek, now scheduled for 4Q 2002. (3) ConocoPhillips had a field trial of the Smith fluid hammer offshore Vietnam. The hammer functioned properly, though the well encountered hole conditions and reaming problems. ConocoPhillips plan another field trial as a result. (4) DOE/NETL extended the contract for the fluid hammer program to allow Novatek to ''optimize'' their much delayed tool to 2003 and to allow Smith International to add ''benchmarking'' tests in light of SDS Digger Tools' current financial inability to participate. (5) ConocoPhillips joined the Industry Advisors for the mud hammer program. Progress during Q4 2002: (1) Smith International participated in the DOE Mud Hammer program through full scale benchmarking testing during the week of 4 November 2003. (2) TerraTek acknowledges Smith International, BP America, PDVSA, and ConocoPhillips for cost-sharing the Smith benchmarking tests allowing extension of the contract to add to the benchmarking testing program. (3) Following the benchmark testing of the Smith International hammer, representatives from DOE/NETL, TerraTek, Smith International and PDVSA met at TerraTek in Salt Lake City to review observations, performance and views on the optimization step for 2003. (4) The December 2002 issue of Journal of Petroleum Technology (Society of Petroleum Engineers) highlighted the DOE fluid hammer testing program and reviewed last years paper on the benchmark performance of the SDS Digger and Novatek hammers. (5) TerraTek's Sid Green presented a technical review for DOE/NETL personnel in Morgantown on ''Impact Rock Breakage'' and its importance on improving fluid hammer performance. Much discussion has taken place on the issues surrounding mud hammer performance at depth conditions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alan Black; Arnis Judzis
2004-10-01
The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for a next level of deep drilling performance; Phase 2--Develop advanced smart bit-fluid prototypes and test at large scale; and Phase 3--Field trial smart bit-fluid concepts, modify as necessary and commercialize products. As of report date, TerraTek has concluded all major preparations for themore » high pressure drilling campaign. Baker Hughes encountered difficulties in providing additional pumping capacity before TerraTek's scheduled relocation to another facility, thus the program was delayed further to accommodate the full testing program.« less
Peeters, Dominique; Sekeris, Elke; Verschaffel, Lieven; Luwel, Koen
2017-01-01
Some authors argue that age-related improvements in number line estimation (NLE) performance result from changes in strategy use. More specifically, children’s strategy use develops from only using the origin of the number line, to using the origin and the endpoint, to eventually also relying on the midpoint of the number line. Recently, Peeters et al. (unpublished) investigated whether the provision of additional unlabeled benchmarks at 25, 50, and 75% of the number line, positively affects third and fifth graders’ NLE performance and benchmark-based strategy use. It was found that only the older children benefitted from the presence of these benchmarks at the quartiles of the number line (i.e., 25 and 75%), as they made more use of these benchmarks, leading to more accurate estimates. A possible explanation for this lack of improvement in third graders might be their inability to correctly link the presented benchmarks with their corresponding numerical values. In the present study, we investigated whether labeling these benchmarks with their corresponding numerical values, would have a positive effect on younger children’s NLE performance and quartile-based strategy use as well. Third and sixth graders were assigned to one of three conditions: (a) a control condition with an empty number line bounded by 0 at the origin and 1,000 at the endpoint, (b) an unlabeled condition with three additional external benchmarks without numerical labels at 25, 50, and 75% of the number line, and (c) a labeled condition in which these benchmarks were labeled with 250, 500, and 750, respectively. Results indicated that labeling the benchmarks has a positive effect on third graders’ NLE performance and quartile-based strategy use, whereas sixth graders already benefited from the mere provision of unlabeled benchmarks. These findings imply that children’s benchmark-based strategy use can be stimulated by adding additional externally provided benchmarks on the number line, but that, depending on children’s age and familiarity with the number range, these additional external benchmarks might need to be labeled. PMID:28713302
Peeters, Dominique; Sekeris, Elke; Verschaffel, Lieven; Luwel, Koen
2017-01-01
Some authors argue that age-related improvements in number line estimation (NLE) performance result from changes in strategy use. More specifically, children's strategy use develops from only using the origin of the number line, to using the origin and the endpoint, to eventually also relying on the midpoint of the number line. Recently, Peeters et al. (unpublished) investigated whether the provision of additional unlabeled benchmarks at 25, 50, and 75% of the number line, positively affects third and fifth graders' NLE performance and benchmark-based strategy use. It was found that only the older children benefitted from the presence of these benchmarks at the quartiles of the number line (i.e., 25 and 75%), as they made more use of these benchmarks, leading to more accurate estimates. A possible explanation for this lack of improvement in third graders might be their inability to correctly link the presented benchmarks with their corresponding numerical values. In the present study, we investigated whether labeling these benchmarks with their corresponding numerical values, would have a positive effect on younger children's NLE performance and quartile-based strategy use as well. Third and sixth graders were assigned to one of three conditions: (a) a control condition with an empty number line bounded by 0 at the origin and 1,000 at the endpoint, (b) an unlabeled condition with three additional external benchmarks without numerical labels at 25, 50, and 75% of the number line, and (c) a labeled condition in which these benchmarks were labeled with 250, 500, and 750, respectively. Results indicated that labeling the benchmarks has a positive effect on third graders' NLE performance and quartile-based strategy use, whereas sixth graders already benefited from the mere provision of unlabeled benchmarks. These findings imply that children's benchmark-based strategy use can be stimulated by adding additional externally provided benchmarks on the number line, but that, depending on children's age and familiarity with the number range, these additional external benchmarks might need to be labeled.
TRECVID: the utility of a content-based video retrieval evaluation
NASA Astrophysics Data System (ADS)
Hauptmann, Alexander G.
2006-01-01
TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
A method to improve the nutritional quality of foods and beverages based on dietary recommendations.
Nijman, C A J; Zijp, I M; Sierksma, A; Roodenburg, A J C; Leenen, R; van den Kerkhoff, C; Weststrate, J A; Meijer, G W
2007-04-01
The increasing consumer interest in health prompted Unilever to develop a globally applicable method (Nutrition Score) to evaluate and improve the nutritional composition of its foods and beverages portfolio. Based on (inter)national dietary recommendations, generic benchmarks were developed to evaluate foods and beverages on their content of trans fatty acids, saturated fatty acids, sodium and sugars. High intakes of these key nutrients are associated with undesirable health effects. In principle, the developed generic benchmarks can be applied globally for any food and beverage product. Product category-specific benchmarks were developed when it was not feasible to meet generic benchmarks because of technological and/or taste factors. The whole Unilever global foods and beverages portfolio has been evaluated and actions have been taken to improve the nutritional quality. The advantages of this method over other initiatives to assess the nutritional quality of foods are that it is based on the latest nutritional scientific insights and its global applicability. The Nutrition Score is the first simple, transparent and straightforward method that can be applied globally and across all food and beverage categories to evaluate the nutritional composition. It can help food manufacturers to improve the nutritional value of their products. In addition, the Nutrition Score can be a starting point for a powerful health indicator front-of-pack. This can have a significant positive impact on public health, especially when implemented by all food manufacturers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alan Black; Arnis Judzis
2003-10-01
This document details the progress to date on the OPTIMIZATION OF DEEP DRILLING PERFORMANCE--DEVELOPMENT AND BENCHMARK TESTING OF ADVANCED DIAMOND PRODUCT DRILL BITS AND HP/HT FLUIDS TO SIGNIFICANTLY IMPROVE RATES OF PENETRATION contract for the year starting October 2002 through September 2002. The industry cost shared program aims to benchmark drilling rates of penetration in selected simulated deep formations and to significantly improve ROP through a team development of aggressive diamond product drill bit--fluid system technologies. Overall the objectives are as follows: Phase 1--Benchmark ''best in class'' diamond and other product drilling bits and fluids and develop concepts for amore » next level of deep drilling performance; Phase 2--Develop advanced smart bit--fluid prototypes and test at large scale; and Phase 3--Field trial smart bit--fluid concepts, modify as necessary and commercialize products. Accomplishments to date include the following: 4Q 2002--Project started; Industry Team was assembled; Kick-off meeting was held at DOE Morgantown; 1Q 2003--Engineering meeting was held at Hughes Christensen, The Woodlands Texas to prepare preliminary plans for development and testing and review equipment needs; Operators started sending information regarding their needs for deep drilling challenges and priorities for large-scale testing experimental matrix; Aramco joined the Industry Team as DEA 148 objectives paralleled the DOE project; 2Q 2003--Engineering and planning for high pressure drilling at TerraTek commenced; 3Q 2003--Continuation of engineering and design work for high pressure drilling at TerraTek; Baker Hughes INTEQ drilling Fluids and Hughes Christensen commence planning for Phase 1 testing--recommendations for bits and fluids.« less
Edwards, Roger A.; Dee, Deborah; Umer, Amna; Perrine, Cria G.; Shealy, Katherine R.; Grummer-Strawn, Laurence M.
2015-01-01
Background A substantial proportion of US maternity care facilities engage in practices that are not evidence-based and that interfere with breastfeeding. The CDC Survey of Maternity Practices in Infant Nutrition and Care (mPINC) showed significant variation in maternity practices among US states. Objective The purpose of this article is to use benchmarking techniques to identify states within relevant peer groups that were top performers on mPINC survey indicators related to breastfeeding support. Methods We used 11 indicators of breastfeeding-related maternity care from the 2011 mPINC survey and benchmarking techniques to organize and compare hospital-based maternity practices across the 50 states and Washington, DC. We created peer categories for benchmarking first by region (grouping states by West, Midwest, South, and Northeast) and then by size (grouping states by the number of maternity facilities and dividing each region into approximately equal halves based on the number of facilities). Results Thirty-four states had scores high enough to serve as benchmarks, and 32 states had scores low enough to reflect the lowest score gap from the benchmark on at least 1 indicator. No state served as the benchmark on more than 5 indicators and no state was furthest from the benchmark on more than 7 indicators. The small peer group benchmarks in the South, West, and Midwest were better than the large peer group benchmarks on 91%, 82%, and 36% of the indicators, respectively. In the West large, the Midwest large, the Midwest small, and the South large peer groups, 4–6 benchmarks showed that less than 50% of hospitals have ideal practice in all states. Conclusion The evaluation presents benchmarks for peer group state comparisons that provide potential and feasible targets for improvement. PMID:24394963
Benchmark Calibration Tests Completed for Stirling Convertor Heater Head Life Assessment
NASA Technical Reports Server (NTRS)
Krause, David L.; Halford, Gary R.; Bowman, Randy R.
2005-01-01
A major phase of benchmark testing has been completed at the NASA Glenn Research Center (http://www.nasa.gov/glenn/), where a critical component of the Stirling Radioisotope Generator (SRG) is undergoing extensive experimentation to aid the development of an analytical life-prediction methodology. Two special-purpose test rigs subjected SRG heater-head pressure-vessel test articles to accelerated creep conditions, using the standard design temperatures to stay within the wall material s operating creep-response regime, but increasing wall stresses up to 7 times over the design point. This resulted in well-controlled "ballooning" of the heater-head hot end. The test plan was developed to provide critical input to analytical parameters in a reasonable period of time.
Enhanced Verification Test Suite for Physics Simulation Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamm, J R; Brock, J S; Brandon, S T
2008-10-10
This document discusses problems with which to augment, in quantity and in quality, the existing tri-laboratory suite of verification problems used by Los Alamos National Laboratory (LANL), Lawrence Livermore National Laboratory (LLNL), and Sandia National Laboratories (SNL). The purpose of verification analysis is demonstrate whether the numerical results of the discretization algorithms in physics and engineering simulation codes provide correct solutions of the corresponding continuum equations. The key points of this document are: (1) Verification deals with mathematical correctness of the numerical algorithms in a code, while validation deals with physical correctness of a simulation in a regime of interest.more » This document is about verification. (2) The current seven-problem Tri-Laboratory Verification Test Suite, which has been used for approximately five years at the DOE WP laboratories, is limited. (3) Both the methodology for and technology used in verification analysis have evolved and been improved since the original test suite was proposed. (4) The proposed test problems are in three basic areas: (a) Hydrodynamics; (b) Transport processes; and (c) Dynamic strength-of-materials. (5) For several of the proposed problems we provide a 'strong sense verification benchmark', consisting of (i) a clear mathematical statement of the problem with sufficient information to run a computer simulation, (ii) an explanation of how the code result and benchmark solution are to be evaluated, and (iii) a description of the acceptance criterion for simulation code results. (6) It is proposed that the set of verification test problems with which any particular code be evaluated include some of the problems described in this document. Analysis of the proposed verification test problems constitutes part of a necessary--but not sufficient--step that builds confidence in physics and engineering simulation codes. More complicated test cases, including physics models of greater sophistication or other physics regimes (e.g., energetic material response, magneto-hydrodynamics), would represent a scientifically desirable complement to the fundamental test cases discussed in this report. The authors believe that this document can be used to enhance the verification analyses undertaken at the DOE WP Laboratories and, thus, to improve the quality, credibility, and usefulness of the simulation codes that are analyzed with these problems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Risner, J. M.; Wiarda, D.; Dunn, M. E.
2011-09-30
New coupled neutron-gamma cross-section libraries have been developed for use in light water reactor (LWR) shielding applications, including pressure vessel dosimetry calculations. The libraries, which were generated using Evaluated Nuclear Data File/B Version VII Release 0 (ENDF/B-VII.0), use the same fine-group and broad-group energy structures as the VITAMIN-B6 and BUGLE-96 libraries. The processing methodology used to generate both libraries is based on the methods used to develop VITAMIN-B6 and BUGLE-96 and is consistent with ANSI/ANS 6.1.2. The ENDF data were first processed into the fine-group pseudo-problem-independent VITAMIN-B7 library and then collapsed into the broad-group BUGLE-B7 library. The VITAMIN-B7 library containsmore » data for 391 nuclides. This represents a significant increase compared to the VITAMIN-B6 library, which contained data for 120 nuclides. The BUGLE-B7 library contains data for the same nuclides as BUGLE-96, and maintains the same numeric IDs for those nuclides. The broad-group data includes nuclides which are infinitely dilute and group collapsed using a concrete weighting spectrum, as well as nuclides which are self-shielded and group collapsed using weighting spectra representative of important regions of LWRs. The verification and validation of the new libraries includes a set of critical benchmark experiments, a set of regression tests that are used to evaluate multigroup crosssection libraries in the SCALE code system, and three pressure vessel dosimetry benchmarks. Results of these tests confirm that the new libraries are appropriate for use in LWR shielding analyses and meet the requirements of Regulatory Guide 1.190.« less
Sethi, Saurabh; Huang, Robert J; Barakat, Monique T; Banaei, Niaz; Friedland, Shai; Banerjee, Subhas
2017-06-01
Recent outbreaks of duodenoscope-transmitted infections underscore the importance of adequate endoscope reprocessing. Adenosine triphosphate (ATP) bioluminescence testing allows rapid evaluation of endoscopes for bacteriologic/biologic residue. In this prospective study we evaluate the utility of ATP in bacteriologic surveillance and the effects of endoscopy staff education and dual cycles of cleaning and high-level disinfection (HLD) on endoscope reprocessing. ATP bioluminescence was measured after precleaning, manual cleaning, and HLD on rinsates from suction-biopsy channels of all endoscopes and elevator channels of duodenoscopes/linear echoendoscopes after use. ATP bioluminescence was remeasured in duodenoscopes (1) after re-education and competency testing of endoscopy staff and subsequently (2) after 2 cycles of precleaning and manual cleaning and single cycle of HLD or (3) after 2 cycles of precleaning, manual cleaning, and HLD. The ideal ATP bioluminescence benchmark of <200 relative light units (RLUs) after manual cleaning was achieved from suction-biopsy channel rinsates of all endoscopes, but 9 of 10 duodenoscope elevator channel rinsates failed to meet this benchmark. Re-education reduced RLUs in duodenoscope elevator channel rinsates after precleaning (23,218.0 vs 1340.5 RLUs, P < .01) and HLD (177.0 vs 12.0 RLUs, P < .01). After 2 cycles of manual cleaning/HLD, duodenoscope elevator channel RLUs achieved levels similar to sterile water, with corresponding negative cultures. ATP testing offers a rapid, inexpensive alternative for detection of endoscope microbial residue. Re-education of endoscopy staff and 2 cycles of cleaning and HLD decreased elevator channel RLUs to levels similar to sterile water and may therefore minimize the risk of transmission of infections by duodenoscopes. Copyright © 2017 American Society for Gastrointestinal Endoscopy. Published by Elsevier Inc. All rights reserved.
Sethi, Saurabh; Huang, Robert J.; Barakat, Monique T.; Banaei, Niaz; Friedland, Shai; Banerjee, Subhas
2017-01-01
Background/Aims Recent outbreaks of duodenoscope-transmitted infections underscore the importance of adequate endoscope reprocessing. Adenosine triphosphate (ATP) bioluminescence testing allows rapid evaluation of endoscopes for bacteriological/biological residue. In this prospective study we evaluate the utility of ATP in bacteriological surveillance, and the effects of endoscopy staff education and dual cycles of cleaning and high-level disinfection (HLD) on endoscope reprocessing. Methods ATP bioluminescence was measured after pre-cleaning, manual cleaning and HLD on rinsates from suction-biopsy channels of all endoscopes and elevator channels of duodenoscopes/linear echoendoscopes after use. ATP bioluminescence was re-measured in duodenoscopes (1) after re-education and competency testing of endoscopy staff, and subsequently (2) after 2 cycles of pre-cleaning and manual cleaning and single cycle of HLD, or (3) after 2 cycles of pre-cleaning, manual cleaning and HLD. Results The ideal ATP bioluminescence benchmark of <200 relative light units (RLUs) after manual cleaning was achieved from suction-biopsy channel rinsates of all endoscopes, but 9 of 10 duodenoscope elevator channel rinsates failed to meet this benchmark. Re-education reduced RLUs in duodenoscope elevator channel rinsates after pre-cleaning (23218.0 vs 1340.5 RLUs, p<0.01) and HLD (177.0 vs 12.0 RLUs, p<0.01). After 2 cycles of manual cleaning/HLD, duodenoscope elevator channel RLUs achieved levels similar to sterile water, with corresponding negative cultures. Conclusions ATP testing offers a rapid, inexpensive alternative for detection of endoscope microbial residue. Re-education of endoscopy staff and 2 cycles of cleaning and HLD decrease elevator channel RLUs to levels similar to sterile water and may therefore minimize the risk of transmission of infections by duodenoscopes. PMID:27818222
Performance Evaluation of Pressure Transducers for Water Impacts
NASA Technical Reports Server (NTRS)
Vassilakos, Gregory J.; Stegall, David E.; Treadway, Sean
2012-01-01
The Orion Multi-Purpose Crew Vehicle is being designed for water landings. In order to benchmark the ability of engineering tools to predict water landing loads, test programs are underway for scale model and full-scale water impacts. These test programs are predicated on the reliable measurement of impact pressure histories. Tests have been performed with a variety of pressure transducers from various manufacturers. Both piezoelectric and piezoresistive devices have been tested. Effects such as thermal shock, pinching of the transducer head, and flushness of the transducer mounting have been studied. Data acquisition issues such as sampling rate and anti-aliasing filtering also have been studied. The response of pressure transducers have been compared side-by-side on an impulse test rig and on a 20-inch diameter hemisphere dropped into a pool of water. The results have identified a range of viable configurations for pressure measurement dependent on the objectives of the test program.
Benchmarks of fairness for health care reform: a policy tool for developing countries.
Daniels, N.; Bryant, J.; Castano, R. A.; Dantes, O. G.; Khan, K. S.; Pannarunothai, S.
2000-01-01
Teams of collaborators from Colombia, Mexico, Pakistan, and Thailand have adapted a policy tool originally developed for evaluating health insurance reforms in the United States into "benchmarks of fairness" for assessing health system reform in developing countries. We describe briefly the history of the benchmark approach, the tool itself, and the uses to which it may be put. Fairness is a wide term that includes exposure to risk factors, access to all forms of care, and to financing. It also includes efficiency of management and resource allocation, accountability, and patient and provider autonomy. The benchmarks standardize the criteria for fairness. Reforms are then evaluated by scoring according to the degree to which they improve the situation, i.e. on a scale of -5 to 5, with zero representing the status quo. The object is to promote discussion about fairness across the disciplinary divisions that keep policy analysts and the public from understanding how trade-offs between different effects of reforms can affect the overall fairness of the reform. The benchmarks can be used at both national and provincial or district levels, and we describe plans for such uses in the collaborating sites. A striking feature of the adaptation process is that there was wide agreement on this ethical framework among the collaborating sites despite their large historical, political and cultural differences. PMID:10916911
Schaffter, Thomas; Marbach, Daniel; Floreano, Dario
2011-08-15
Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks. Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5). GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data. Supplementary data are available at Bioinformatics online. dario.floreano@epfl.ch.
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mishra, Alok; Li, Lingda; Kong, Martin
Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers shouldmore » use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.« less
Resonance Parameter Adjustment Based on Integral Experiments
Sobes, Vladimir; Leal, Luiz; Arbanas, Goran; ...
2016-06-02
Our project seeks to allow coupling of differential and integral data evaluation in a continuous-energy framework and to use the generalized linear least-squares (GLLS) methodology in the TSURFER module of the SCALE code package to update the parameters of a resolved resonance region evaluation. We recognize that the GLLS methodology in TSURFER is identical to the mathematical description of a Bayesian update in SAMMY, the SAMINT code was created to use the mathematical machinery of SAMMY to update resolved resonance parameters based on integral data. Traditionally, SAMMY used differential experimental data to adjust nuclear data parameters. Integral experimental data, suchmore » as in the International Criticality Safety Benchmark Experiments Project, remain a tool for validation of completed nuclear data evaluations. SAMINT extracts information from integral benchmarks to aid the nuclear data evaluation process. Later, integral data can be used to resolve any remaining ambiguity between differential data sets, highlight troublesome energy regions, determine key nuclear data parameters for integral benchmark calculations, and improve the nuclear data covariance matrix evaluation. Moreover, SAMINT is not intended to bias nuclear data toward specific integral experiments but should be used to supplement the evaluation of differential experimental data. Using GLLS ensures proper weight is given to the differential data.« less
Summary of ORSphere critical and reactor physics measurements
NASA Astrophysics Data System (ADS)
Marshall, Margaret A.; Bess, John D.
2017-09-01
In the early 1970s Dr. John T. Mihalczo (team leader), J.J. Lynn, and J.R. Taylor performed experiments at the Oak Ridge Critical Experiments Facility (ORCEF) with highly enriched uranium (HEU) metal (called Oak Ridge Alloy or ORALLOY) to recreate GODIVA I results with greater accuracy than those performed at Los Alamos National Laboratory in the 1950s. The purpose of the Oak Ridge ORALLOY Sphere (ORSphere) experiments was to estimate the unreflected and unmoderated critical mass of an idealized sphere of uranium metal corrected to a density, purity, and enrichment such that it could be compared with the GODIVA I experiments. This critical configuration has been evaluated. Preliminary results were presented at ND2013. Since then, the evaluation was finalized and judged to be an acceptable benchmark experiment for the International Criticality Safety Benchmark Experiment Project (ICSBEP). Additionally, reactor physics measurements were performed to determine surface button worths, central void worth, delayed neutron fraction, prompt neutron decay constant, fission density and neutron importance. These measurements have been evaluated and found to be acceptable experiments and are discussed in full detail in the International Handbook of Evaluated Reactor Physics Benchmark Experiments. The purpose of this paper is to summarize all the evaluated critical and reactor physics measurements evaluations.
ERIC Educational Resources Information Center
Zavadsky, Heather
2014-01-01
The role of state education agencies (SEAs) has shifted significantly from low-profile, compliance activities like managing federal grants to engaging in more complex and politically charged tasks like setting curriculum standards, developing accountability systems, and creating new teacher evaluation systems. The move from compliance-monitoring…
75 FR 24534 - Treatment of Cigarettes and Smokeless Tobacco as Nonmailable Matter
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-05
... photocopy all written comments at USPS Headquarters Library, 475 L'Enfant Plaza, SW., 11th Floor North... benchmarking purposes of cigarette brands or sub-brands among existing adult smokers.'' 18 U.S.C. 1716E(b)(5)(D... of evaluating the product for quality assurance and benchmarking purposes of cigarette brands or sub...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-18
... the public additional time to evaluate the data used to derive a benchmark for conductivity. The... FR 18499). By following the link below, reviewers may download the initial data and EPA's derivative data sets that were used to calculate the conductivity benchmark. These reports were developed by the...
SAT® Subject Area Readiness Indicators: Reading, Writing, and STEM
ERIC Educational Resources Information Center
Wyatt, Jeffrey N.; Remigio, Mylene; Camara, Wayne J.
2012-01-01
In 2011, the College Board developed the SAT College and Career Readiness Benchmark to assist educators and policymakers in their efforts to better evaluate the college readiness of their students. This benchmark was designed to identify the point on the SAT score scale that is indicative of students' having a high likelihood of success in…
NASA Technical Reports Server (NTRS)
Alexander, Dennis R.
1990-01-01
Research was conducted on characteristics of aerosol sprays using a P/DPA and a laser imaging/video processing system on a NASA MOD-1 air assist nozzle being evaluated for use in aircraft icing research. Benchmark tests were performed on monodispersed particles and on the NASA MOD-1 nozzle under identical lab operating conditions. The laser imaging/video processing system and the P/DPA showed agreement on a calibration tests in monodispersed aerosol sprays of + or - 2.6 micron with a standard deviation of + or - 2.6 micron. Benchmark tests were performed on the NASA MOD-1 nozzle on the centerline and radially at 0.5 inch increments to the outer edge of the spray plume at a distance 2 ft downstream from the exit nozzle. Comparative results at two operation conditions of the nozzle are presented for the two instruments. For the 1st case studied, the deviation in arithmetic mean diameters determined by the two instruments was in a range of 0.1 to 2.8 micron, and the deviation in Sauter mean diameters varied from 0 to 2.2 micron. Severe operating conditions in the 2nd case resulted in the arithmetic mean diameter deviating from 1.4 to 7.1 micron and the deviation in the Sauter mean diameters ranging from 0.4 to 6.7 micron.
Turnkey CAD/CAM selection and evaluation
NASA Technical Reports Server (NTRS)
Moody, T.
1980-01-01
The methodology to be followed in evaluating and selecting a computer system for manufacturing applications is discussed. Main frames and minicomputers are considered. Benchmark evaluations, demonstrations, and contract negotiations are discussed.
NASA Astrophysics Data System (ADS)
Karner, Donald; Francfort, James
The Advanced Vehicle Testing Activity (AVTA), part of the U.S. Department of Energy's FreedomCAR and Vehicle Technologies Program, has conducted testing of advanced technology vehicles since August 1995 in support of the AVTA goal to provide benchmark data for technology modeling, and vehicle development programs. The AVTA has tested full size electric vehicles, urban electric vehicles, neighborhood electric vehicles, and hydrogen internal combustion engine powered vehicles. Currently, the AVTA is conducting baseline performance, battery benchmark and fleet tests of hybrid electric vehicles (HEV) and plug-in hybrid electric vehicles (PHEV). Testing has included all HEVs produced by major automotive manufacturers and spans over 2.5 million test miles. Testing is currently incorporating PHEVs from four different vehicle converters. The results of all testing are posted on the AVTA web page maintained by the Idaho National Laboratory.
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess; J. Blair Briggs; Jim Gulliford
2014-10-01
The International Reactor Physics Experiment Evaluation Project (IRPhEP) is a widely recognized world class program. The work of the IRPhEP is documented in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook). Integral data from the IRPhEP Handbook is used by reactor safety and design, nuclear data, criticality safety, and analytical methods development specialists, worldwide, to perform necessary validations of their calculational techniques. The IRPhEP Handbook is among the most frequently quoted reference in the nuclear industry and is expected to be a valuable resource for future decades.
ERIC Educational Resources Information Center
Stern, Luli; Ahlgren, Andrew
2002-01-01
Project 2061 of the American Association for the Advancement of Science (AAAS) developed and field-tested a procedure for analyzing curriculum materials, including assessments, in terms of contribution to the attainment of benchmarks and standards. Using this procedure, Project 2061 produced a database of reports on nine science middle school…
Benchmarking and testing the "Sea Level Equation
NASA Astrophysics Data System (ADS)
Spada, G.; Barletta, V. R.; Klemann, V.; van der Wal, W.; James, T. S.; Simon, K.; Riva, R. E. M.; Martinec, Z.; Gasperini, P.; Lund, B.; Wolf, D.; Vermeersen, L. L. A.; King, M. A.
2012-04-01
The study of the process of Glacial Isostatic Adjustment (GIA) and of the consequent sea level variations is gaining an increasingly important role within the geophysical community. Understanding the response of the Earth to the waxing and waning ice sheets is crucial in various contexts, ranging from the interpretation of modern satellite geodetic measurements to the projections of future sea level trends in response to climate change. All the processes accompanying GIA can be described solving the so-called Sea Level Equation (SLE), an integral equation that accounts for the interactions between the ice sheets, the solid Earth, and the oceans. Modern approaches to the SLE are based on various techniques that range from purely analytical formulations to fully numerical methods. Despite various teams independently investigating GIA, we do not have a suitably large set of agreed numerical results through which the methods may be validated. Following the example of the mantle convection community and our recent successful Benchmark for Post Glacial Rebound codes (Spada et al., 2011, doi: 10.1111/j.1365-246X.2011.04952.x), here we present the results of a benchmark study of independently developed codes designed to solve the SLE. This study has taken place within a collaboration facilitated through the European Cooperation in Science and Technology (COST) Action ES0701. The tests involve predictions of past and current sea level variations, and 3D deformations of the Earth surface. In spite of the signi?cant differences in the numerical methods employed, the test computations performed so far show a satisfactory agreement between the results provided by the participants. The differences found, which can be often attributed to the different numerical algorithms employed within the community, help to constrain the intrinsic errors in model predictions. These are of fundamental importance for a correct interpretation of the geodetic variations observed today, and particularly for the evaluation of climate-driven sea level variations.
A cohort study of cervical screening using partial HPV typing and cytology triage.
Schiffman, Mark; Hyun, Noorie; Raine-Bennett, Tina R; Katki, Hormuzd; Fetterman, Barbara; Gage, Julia C; Cheung, Li C; Befano, Brian; Poitras, Nancy; Lorey, Thomas; Castle, Philip E; Wentzensen, Nicolas
2016-12-01
HPV testing is more sensitive than cytology for cervical screening. However, to incorporate HPV tests into screening, risk-stratification ("triage") of HPV-positive women is needed to avoid excessive colposcopy and overtreatment. We prospectively evaluated combinations of partial HPV typing (Onclarity, BD) and cytology triage, and explored whether management could be simplified, based on grouping combinations yielding similar 3-year or 18-month CIN3+ risks. We typed ∼9,000 archived specimens, taken at enrollment (2007-2011) into the NCI-Kaiser Permanente Northern California (KPNC) HPV Persistence and Progression (PaP) cohort. Stratified sampling, with reweighting in the statistical analysis, permitted risk estimation of HPV/cytology combinations for the 700,000+-woman KPNC screening population. Based on 3-year CIN3+ risks, Onclarity results could be combined into five groups (HPV16, else HPV18/45, else HPV31/33/58/52, else HPV51/35/39/68/56/66/68, else HPV negative); cytology results fell into three risk groups ("high-grade," ASC-US/LSIL, NILM). For the resultant 15 HPV group-cytology combinations, 3-year CIN3+ risks ranged 1,000-fold from 60.6% to 0.06%. To guide management, we compared the risks to established "benchmark" risk/management thresholds in this same population (e.g., LSIL predicted 3-year CIN3+ risk of 5.8% in the screening population, providing the benchmark for colposcopic referral). By benchmarking to 3-year risk thresholds (supplemented by 18-month estimates), the widely varying risk strata could be condensed into four action bands (very high risk of CIN3+ mandating consideration of cone biopsy if colposcopy did not find precancer; moderate risk justifying colposcopy; low risk managed by intensified follow-up to permit HPV "clearance"; and very low risk permitting routine screening.) Overall, the results support primary HPV testing, with management of HPV-positive women using partial HPV typing and cytology. © 2016 UICC.
Cepoiu-Martin, Monica; Bischak, Diane P
2018-02-01
The increase in the incidence of dementia in the aging population and the decrease in the availability of informal caregivers put pressure on continuing care systems to care for a growing number of people with disabilities. Policy changes in the continuing care system need to address this shift in the population structure. One of the most effective tools for assessing policies in complex systems is system dynamics. Nevertheless, this method is underused in continuing care capacity planning. A system dynamics model of the Alberta Continuing Care System was developed using stylized data. Sensitivity analyses and policy evaluations were conducted to demonstrate the use of system dynamics modelling in this area of public health planning. We focused our policy exploration on introducing staff/resident benchmarks in both supportive living and long-term care (LTC). The sensitivity analyses presented in this paper help identify leverage points in the system that need to be acknowledged when policy decisions are made. Our policy explorations showed that the deficits of staff increase dramatically when benchmarks are introduced, as expected, but at the end of the simulation period, the difference in deficits of both nurses and health care aids are similar between the 2 scenarios tested. Modifying the benchmarks in LTC only versus in both supportive living and LTC has similar effects on staff deficits in long term, under the assumptions of this particular model. The continuing care system dynamics model can be used to test various policy scenarios, allowing decision makers to visualize the effect of a certain policy choice on different system variables and to compare different policy options. Our exploration illustrates the use of system dynamics models for policy making in complex health care systems. © 2017 John Wiley & Sons, Ltd.
Nicolucci, Antonio; Rossi, Maria C; Pellegrini, Fabio; Lucisano, Giuseppe; Pintaudi, Basilio; Gentile, Sandro; Marra, Giampiero; Skovlund, Soren E; Vespasiani, Giacomo
2014-01-01
In the context of the DAWN-2 initiatives, the BENCH-D Study aims to test a model of regional benchmarking to improve not only the quality of diabetes care, but also patient-centred outcomes. As part of the AMD-Annals quality improvement program, 32 diabetes clinics in 4 Italian regions extracted clinical data from electronic databases for measuring process and outcome quality indicators. A random sample of patients with type 2 diabetes filled in a questionnaire including validated instruments to assess patient-centred indicators: SF-12 Health Survey, WHO-5 Well-Being Index, Diabetes Empowerment Scale, Problem Areas in Diabetes, Health Care Climate Questionnaire, Patients Assessment of Chronic Illness Care, Barriers to Medications, Patient Support, Diabetes Self-care Activities, and Global Satisfaction for Diabetes Treatment. Data were discussed with participants in regional meetings. Main problems, obstacles and solutions were identified through a standardized process, and a regional mandate was produced to drive the priority actions. Overall, clinical indicators on 78,854 patients have been measured; additionally, 2,390 patients filled-in the questionnaire. The regional mandates were officially launched in March 2012. Clinical and patient-centred indicators will be evaluated again after 18 months. A final assessment of clinical indicators will take place after 30 months. In the context of the BENCH-D study, a set of instruments has been validated to measure patient well-being and satisfaction with the care. In the four regional meetings, different priorities were identified, reflecting different organizational resources of the different areas. In all the regions, a major challenge was represented by the need of skills and instruments to address psychosocial issues of people with diabetes. The BENCH-D study allows a field testing of benchmarking activities focused on clinical and patient-centred indicators.
Benchmarks of programming languages for special purposes in the space station
NASA Technical Reports Server (NTRS)
Knoebel, Arthur
1986-01-01
Although Ada is likely to be chosen as the principal programming language for the Space Station, certain needs, such as expert systems and robotics, may be better developed in special languages. The languages, LISP and Prolog, are studied and some benchmarks derived. The mathematical foundations for these languages are reviewed. Likely areas of the space station are sought out where automation and robotics might be applicable. Benchmarks are designed which are functional, mathematical, relational, and expert in nature. The coding will depend on the particular versions of the languages which become available for testing.
RERTR-12 Post-irradiation Examination Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rice, Francine; Williams, Walter; Robinson, Adam
2015-02-01
The following report contains the results and conclusions for the post irradiation examinations performed on RERTR-12 Insertion 2 experiment plates. These exams include eddy-current testing to measure oxide growth; neutron radiography for evaluating the condition of the fuel prior to sectioning and determination of fuel relocation and geometry changes; gamma scanning to provide relative measurements for burnup and indication of fuel- and fission-product relocation; profilometry to measure dimensional changes of the fuel plate; analytical chemistry to benchmark the physics burnup calculations; metallography to examine the microstructural changes in the fuel, interlayer and cladding; and microhardness testing to determine the material-propertymore » changes of the fuel and cladding.« less
FloPSy - Search-Based Floating Point Constraint Solving for Symbolic Execution
NASA Astrophysics Data System (ADS)
Lakhotia, Kiran; Tillmann, Nikolai; Harman, Mark; de Halleux, Jonathan
Recently there has been an upsurge of interest in both, Search-Based Software Testing (SBST), and Dynamic Symbolic Execution (DSE). Each of these two approaches has complementary strengths and weaknesses, making it a natural choice to explore the degree to which the strengths of one can be exploited to offset the weakness of the other. This paper introduces an augmented version of DSE that uses a SBST-based approach to handling floating point computations, which are known to be problematic for vanilla DSE. The approach has been implemented as a plug in for the Microsoft Pex DSE testing tool. The paper presents results from both, standard evaluation benchmarks, and two open source programs.
NASA Astrophysics Data System (ADS)
Garnier, Valérie; Honnorat, Marc; Benshila, Rachid; Boutet, Martial; Cambon, Gildas; Chanut, Jérome; Couvelard, Xavier; Debreu, Laurent; Ducousso, Nicolas; Duhaut, Thomas; Dumas, Franck; Flavoni, Simona; Gouillon, Flavien; Lathuilière, Cyril; Le Boyer, Arnaud; Le Sommer, Julien; Lyard, Florent; Marsaleix, Patrick; Marchesiello, Patrick; Soufflet, Yves
2016-04-01
The COMODO group (http://www.comodo-ocean.fr) gathers developers of global and limited-area ocean models (NEMO, ROMS_AGRIF, S, MARS, HYCOM, S-TUGO) with the aim to address well-identified numerical issues. In order to evaluate existing models, to improve numerical approaches and methods or concept (such as effective resolution) to assess the behavior of numerical model in complex hydrodynamical regimes and to propose guidelines for the development of future ocean models, a benchmark suite that covers both idealized test cases dedicated to targeted properties of numerical schemes and more complex test case allowing the evaluation of the kernel coherence is proposed. The benchmark suite is built to study separately, then together, the main components of an ocean model : the continuity and momentum equations, the advection-diffusion of the tracers, the vertical coordinate design and the time stepping algorithms. The test cases are chosen for their simplicity of implementation (analytic initial conditions), for their capacity to focus on a (few) scheme or part of the kernel, for the availability of analytical solutions or accurate diagnoses and lastly to simulate a key oceanic processus in a controlled environment. Idealized test cases allow to verify properties of numerical schemes advection-diffusion of tracers, - upwelling, - lock exchange, - baroclinic vortex, - adiabatic motion along bathymetry, and to put into light numerical issues that remain undetected in realistic configurations - trajectory of barotropic vortex, - interaction current - topography. When complexity in the simulated dynamics grows up, - internal wave, - unstable baroclinic jet, the sharing of the same experimental designs by different existing models is useful to get a measure of the model sensitivity to numerical choices (Soufflet et al., 2016). Lastly, test cases help in understanding the submesoscale influence on the dynamics (Couvelard et al., 2015). Such a benchmark suite is an interesting bed to continue research in numerical approaches as well as an efficient tool to maintain any oceanic code and assure the users a stamped model in a certain range of hydrodynamical regimes. Thanks to a common netCDF format, this suite is completed with a python library that encompasses all the tools and metrics used to assess the efficiency of the numerical methods. References - Couvelard X., F. Dumas, V. Garnier, A.L. Ponte, C. Talandier, A.M. Treguier (2015). Mixed layer formation and restratification in presence of mesoscale and submesoscale turbulence. Ocean Modelling, Vol 96-2, p 243-253. doi:10.1016/j.ocemod.2015.10.004. - Soufflet Y., P. Marchesiello, F. Lemarié, J. Jouanno, X. Capet, L. Debreu , R. Benshila (2016). On effective resolution in ocean models. Ocean Modelling, in press. doi:10.1016/j.ocemod.2015.12.004
Jet printing of convex and concave polymer micro-lenses.
Blattmann, M; Ocker, M; Zappe, H; Seifert, A
2015-09-21
We describe a novel approach for fabricating customized convex as well as concave micro-lenses using substrates with sophisticated pinning architecture and utilizing a drop-on-demand jet printer. The polymeric lens material deposited on the wafer is cured by UV light irradiation yielding lenses with high quality surfaces. Surface shape and roughness of the cured polymer lenses are characterized by white light interferometry. Their optical quality is demonstrated by imaging an USAF1951 test chart. The evaluated modulation transfer function is compared to Zemax simulations as a benchmark for the fabricated lenses.
Biomass fuels update. TVAs biomass fuels program
NASA Astrophysics Data System (ADS)
1982-02-01
Equipment was installed and tests were conducted on the ethanol from hardwood project. Location of hardwoods, to improve forest management, and to reduce the cost of harvesting woody biomass was assessed. Substantial underutilized cropland exists in the Valley, and a questionnaire survey was administered to supplement available cropland data. The potential liquid fuel yields and production management practices for alternative starch, sugar, and vegetable oil crops were determined to obtain benchmark data and to evaluate alcohol production from alternative agricultural feedstocks. Workshops were conducted to provide information on production of alcohol.
Validation of a three-dimensional viscous analysis of axisymmetric supersonic inlet flow fields
NASA Technical Reports Server (NTRS)
Benson, T. J.; Anderson, B. H.
1983-01-01
A three-dimensional viscous marching analysis for supersonic inlets was developed. To verify this analysis several benchmark axisymmetric test configurations were studied and are compared to experimental data. Detailed two-dimensional results for shock-boundary layer interactions are presented for flows with and without boundary layer bleed. Three dimensional calculations of a cone at angle of attack and a full inlet at attack are also discussed and evaluated. Results of the calculations demonstrate the code's ability to predict complex flow fields and establish guidelines for future calculations using similar codes.
Impact of quality circles for improvement of asthma care: results of a randomized controlled trial
Schneider, Antonius; Wensing, Michel; Biessecker, Kathrin; Quinzler, Renate; Kaufmann-Kolle, Petra; Szecsenyi, Joachim
2008-01-01
Rationale and aims Quality circles (QCs) are well established as a means of aiding doctors. New quality improvement strategies include benchmarking activities. The aim of this paper was to evaluate the efficacy of QCs for asthma care working either with general feedback or with an open benchmark. Methods Twelve QCs, involving 96 general practitioners, were organized in a randomized controlled trial. Six worked with traditional anonymous feedback and six with an open benchmark; both had guided discussion from a trained moderator. Forty-three primary care practices agreed to give out questionnaires to patients to evaluate the efficacy of QCs. Results A total of 256 patients participated in the survey, of whom 185 (72.3%) responded to the follow-up 1 year later. Use of inhaled steroids at baseline was high (69%) and self-management low (asthma education 27%, individual emergency plan 8%, and peak flow meter at home 21%). Guideline adherence in drug treatment increased (P = 0.19), and asthma steps improved (P = 0.02). Delivery of individual emergency plans increased (P = 0.008), and unscheduled emergency visits decreased (P = 0.064). There was no change in asthma education and peak flow meter usage. High medication guideline adherence was associated with reduced emergency visits (OR 0.24; 95% CI 0.07–0.89). Use of theophylline was associated with hospitalization (OR 7.1; 95% CI 1.5–34.3) and emergency visits (OR 4.9; 95% CI 1.6–14.7). There was no difference between traditional and benchmarking QCs. Conclusions Quality circles working with individualized feedback are effective at improving asthma care. The trial may have been underpowered to detect specific benchmarking effects. Further research is necessary to evaluate strategies for improving the self-management of asthma patients. PMID:18093108
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bess, John D.
2014-03-01
PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess
2013-03-01
PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess
2013-03-01
PROTEUS is a zero-power research reactor based on a cylindrical graphite annulus with a central cylindrical cavity. The graphite annulus remains basically the same for all experimental programs, but the contents of the central cavity are changed according to the type of reactor being investigated. Through most of its service history, PROTEUS has represented light-water reactors, but from 1992 to 1996 PROTEUS was configured as a pebble-bed reactor (PBR) critical facility and designated as HTR-PROTEUS. The nomenclature was used to indicate that this series consisted of High Temperature Reactor experiments performed in the PROTEUS assembly. During this period, seventeen criticalmore » configurations were assembled and various reactor physics experiments were conducted. These experiments included measurements of criticality, differential and integral control rod and safety rod worths, kinetics, reaction rates, water ingress effects, and small sample reactivity effects (Ref. 3). HTR-PROTEUS was constructed, and the experimental program was conducted, for the purpose of providing experimental benchmark data for assessment of reactor physics computer codes. Considerable effort was devoted to benchmark calculations as a part of the HTR-PROTEUS program. References 1 and 2 provide detailed data for use in constructing models for codes to be assessed. Reference 3 is a comprehensive summary of the HTR-PROTEUS experiments and the associated benchmark program. This document draws freely from these references. Only Cores 9 and 10 are evaluated in this benchmark report due to similarities in their construction. The other core configurations of the HTR-PROTEUS program are evaluated in their respective reports as outlined in Section 1.0. Cores 9 and 10 were evaluated and determined to be acceptable benchmark experiments.« less
Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.
2010-01-01
Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012
Validation of the WIMSD4M cross-section generation code with benchmark results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deen, J.R.; Woodruff, W.L.; Leal, L.E.
1995-01-01
The WIMSD4 code has been adopted for cross-section generation in support of the Reduced Enrichment Research and Test Reactor (RERTR) program at Argonne National Laboratory (ANL). Subsequently, the code has undergone several updates, and significant improvements have been achieved. The capability of generating group-collapsed micro- or macroscopic cross sections from the ENDF/B-V library and the more recent evaluation, ENDF/B-VI, in the ISOTXS format makes the modified version of the WIMSD4 code, WIMSD4M, very attractive, not only for the RERTR program, but also for the reactor physics community. The intent of the present paper is to validate the WIMSD4M cross-section librariesmore » for reactor modeling of fresh water moderated cores. The results of calculations performed with multigroup cross-section data generated with the WIMSD4M code will be compared against experimental results. These results correspond to calculations carried out with thermal reactor benchmarks of the Oak Ridge National Laboratory (ORNL) unreflected HEU critical spheres, the TRX LEU critical experiments, and calculations of a modified Los Alamos HEU D{sub 2}O moderated benchmark critical system. The benchmark calculations were performed with the discrete-ordinates transport code, TWODANT, using WIMSD4M cross-section data. Transport calculations using the XSDRNPM module of the SCALE code system are also included. In addition to transport calculations, diffusion calculations with the DIF3D code were also carried out, since the DIF3D code is used in the RERTR program for reactor analysis and design. For completeness, Monte Carlo results of calculations performed with the VIM and MCNP codes are also presented.« less
Validation of the WIMSD4M cross-section generation code with benchmark results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leal, L.C.; Deen, J.R.; Woodruff, W.L.
1995-02-01
The WIMSD4 code has been adopted for cross-section generation in support of the Reduced Enrichment for Research and Test (RERTR) program at Argonne National Laboratory (ANL). Subsequently, the code has undergone several updates, and significant improvements have been achieved. The capability of generating group-collapsed micro- or macroscopic cross sections from the ENDF/B-V library and the more recent evaluation, ENDF/B-VI, in the ISOTXS format makes the modified version of the WIMSD4 code, WIMSD4M, very attractive, not only for the RERTR program, but also for the reactor physics community. The intent of the present paper is to validate the procedure to generatemore » cross-section libraries for reactor analyses and calculations utilizing the WIMSD4M code. To do so, the results of calculations performed with group cross-section data generated with the WIMSD4M code will be compared against experimental results. These results correspond to calculations carried out with thermal reactor benchmarks of the Oak Ridge National Laboratory(ORNL) unreflected critical spheres, the TRX critical experiments, and calculations of a modified Los Alamos highly-enriched heavy-water moderated benchmark critical system. The benchmark calculations were performed with the discrete-ordinates transport code, TWODANT, using WIMSD4M cross-section data. Transport calculations using the XSDRNPM module of the SCALE code system are also included. In addition to transport calculations, diffusion calculations with the DIF3D code were also carried out, since the DIF3D code is used in the RERTR program for reactor analysis and design. For completeness, Monte Carlo results of calculations performed with the VIM and MCNP codes are also presented.« less
An automated benchmarking platform for MHC class II binding prediction methods.
Andreatta, Massimo; Trolle, Thomas; Yan, Zhen; Greenbaum, Jason A; Peters, Bjoern; Nielsen, Morten
2018-05-01
Computational methods for the prediction of peptide-MHC binding have become an integral and essential component for candidate selection in experimental T cell epitope discovery studies. The sheer amount of published prediction methods-and often discordant reports on their performance-poses a considerable quandary to the experimentalist who needs to choose the best tool for their research. With the goal to provide an unbiased, transparent evaluation of the state-of-the-art in the field, we created an automated platform to benchmark peptide-MHC class II binding prediction tools. The platform evaluates the absolute and relative predictive performance of all participating tools on data newly entered into the Immune Epitope Database (IEDB) before they are made public, thereby providing a frequent, unbiased assessment of available prediction tools. The benchmark runs on a weekly basis, is fully automated, and displays up-to-date results on a publicly accessible website. The initial benchmark described here included six commonly used prediction servers, but other tools are encouraged to join with a simple sign-up procedure. Performance evaluation on 59 data sets composed of over 10 000 binding affinity measurements suggested that NetMHCIIpan is currently the most accurate tool, followed by NN-align and the IEDB consensus method. Weekly reports on the participating methods can be found online at: http://tools.iedb.org/auto_bench/mhcii/weekly/. mniel@bioinformatics.dtu.dk. Supplementary data are available at Bioinformatics online.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mardirossian, Narbe; Head-Gordon, Martin
Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.
Mardirossian, Narbe; Head-Gordon, Martin
2016-11-09
Benchmark datasets of non-covalent interactions are essential for assessing the performance of density functionals and other quantum chemistry approaches. In a recent blind test, Taylor et al. benchmarked 14 methods on a new dataset consisting of 10 dimer potential energy curves calculated using coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) at the complete basis set (CBS) limit (80 data points in total). Finally, the dataset is particularly interesting because compressed, near-equilibrium, and stretched regions of the potential energy surface are extensively sampled.
NASA Astrophysics Data System (ADS)
Kaskhedikar, Apoorva Prakash
According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI pertinent to the building type. The ability to identify and rank the important variables is of great importance in practical implementation of the benchmarking tools which rely on query-based building and HVAC variable filters specified by the user.
Hermans, Michel P; Brotons, Carlos; Elisaf, Moses; Michel, Georges; Muls, Erik; Nobels, Frank
2013-12-01
Micro- and macrovascular complications of type 2 diabetes have an adverse impact on survival, quality of life and healthcare costs. The OPTIMISE (OPtimal Type 2 dIabetes Management Including benchmarking and Standard trEatment) trial comparing physicians' individual performances with a peer group evaluates the hypothesis that benchmarking, using assessments of change in three critical quality indicators of vascular risk: glycated haemoglobin (HbA1c), low-density lipoprotein-cholesterol (LDL-C) and systolic blood pressure (SBP), may improve quality of care in type 2 diabetes in the primary care setting. This was a randomised, controlled study of 3980 patients with type 2 diabetes. Six European countries participated in the OPTIMISE study (NCT00681850). Quality of care was assessed by the percentage of patients achieving pre-set targets for the three critical quality indicators over 12 months. Physicians were randomly assigned to receive either benchmarked or non-benchmarked feedback. All physicians received feedback on six of their patients' modifiable outcome indicators (HbA1c, fasting glycaemia, total cholesterol, high-density lipoprotein-cholesterol (HDL-C), LDL-C and triglycerides). Physicians in the benchmarking group additionally received information on levels of control achieved for the three critical quality indicators compared with colleagues. At baseline, the percentage of evaluable patients (N = 3980) achieving pre-set targets was 51.2% (HbA1c; n = 2028/3964); 34.9% (LDL-C; n = 1350/3865); 27.3% (systolic blood pressure; n = 911/3337). OPTIMISE confirms that target achievement in the primary care setting is suboptimal for all three critical quality indicators. This represents an unmet but modifiable need to revisit the mechanisms and management of improving care in type 2 diabetes. OPTIMISE will help to assess whether benchmarking is a useful clinical tool for improving outcomes in type 2 diabetes.
Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.
Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao
2017-02-21
In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our latest work on this track, i.e. CASF-2013, the performance of a scoring function was quantified in four aspects, including "scoring power", "ranking power", "docking power", and "screening power". All four performance tests were conducted on a test set containing 195 high-quality protein-ligand complexes selected from PDBbind. A panel of 20 standard scoring functions were tested as demonstration. Importantly, CASF is designed to be an open-access benchmark, with which scoring functions developed by different researchers can be compared on the same grounds. Indeed, it has become a popular choice for scoring function validation in recent years. Despite the considerable progress that has been made so far, the performance of today's scoring functions still does not meet people's expectations in many aspects. There is a constant demand for more advanced scoring functions. Our efforts have helped to overcome some obstacles underlying scoring function development so that the researchers in this field can move forward faster. We will continue to improve the PDBbind database and the CASF benchmark in the future to keep them as useful community resources.
NASA Astrophysics Data System (ADS)
Rodriguez, Tony F.; Cushman, David A.
2003-06-01
With the growing commercialization of watermarking techniques in various application scenarios it has become increasingly important to quantify the performance of watermarking products. The quantification of relative merits of various products is not only essential in enabling further adoption of the technology by society as a whole, but will also drive the industry to develop testing plans/methodologies to ensure quality and minimize cost (to both vendors & customers.) While the research community understands the theoretical need for a publicly available benchmarking system to quantify performance, there has been less discussion on the practical application of these systems. By providing a standard set of acceptance criteria, benchmarking systems can dramatically increase the quality of a particular watermarking solution, validating the product performances if they are used efficiently and frequently during the design process. In this paper we describe how to leverage specific design of experiments techniques to increase the quality of a watermarking scheme, to be used with the benchmark tools being developed by the Ad-Hoc Watermark Verification Group. A Taguchi Loss Function is proposed for an application and orthogonal arrays used to isolate optimal levels for a multi-factor experimental situation. Finally, the results are generalized to a population of cover works and validated through an exhaustive test.
Benchmark matrix and guide: Part III.
1992-01-01
The final article in the "Benchmark Matrix and Guide" series developed by Headquarters Air Force Logistics Command completes the discussion of the last three categories that are essential ingredients of a successful total quality management (TQM) program. Detailed behavioral objectives are listed in the areas of recognition, process improvement, and customer focus. These vertical categories are meant to be applied to the levels of the matrix that define the progressive stages of the TQM: business as usual, initiation, implementation, expansion, and integration. By charting the horizontal progress level and the vertical TQM category, the quality management professional can evaluate the current state of TQM in any given organization. As each category is completed, new goals can be defined in order to advance to a higher level. The benchmarking process is integral to quality improvement efforts because it focuses on the highest possible standards to evaluate quality programs.
Benchmarking a geostatistical procedure for the homogenisation of annual precipitation series
NASA Astrophysics Data System (ADS)
Caineta, Júlio; Ribeiro, Sara; Henriques, Roberto; Soares, Amílcar; Costa, Ana Cristina
2014-05-01
The European project COST Action ES0601, Advances in homogenisation methods of climate series: an integrated approach (HOME), has brought to attention the importance of establishing reliable homogenisation methods for climate data. In order to achieve that, a benchmark data set, containing monthly and daily temperature and precipitation data, was created to be used as a comparison basis for the effectiveness of those methods. Several contributions were submitted and evaluated by a number of performance metrics, validating the results against realistic inhomogeneous data. HOME also led to the development of new homogenisation software packages, which included feedback and lessons learned during the project. Preliminary studies have suggested a geostatistical stochastic approach, which uses Direct Sequential Simulation (DSS), as a promising methodology for the homogenisation of precipitation data series. Based on the spatial and temporal correlation between the neighbouring stations, DSS calculates local probability density functions at a candidate station to detect inhomogeneities. The purpose of the current study is to test and compare this geostatistical approach with the methods previously presented in the HOME project, using surrogate precipitation series from the HOME benchmark data set. The benchmark data set contains monthly precipitation surrogate series, from which annual precipitation data series were derived. These annual precipitation series were subject to exploratory analysis and to a thorough variography study. The geostatistical approach was then applied to the data set, based on different scenarios for the spatial continuity. Implementing this procedure also promoted the development of a computer program that aims to assist on the homogenisation of climate data, while minimising user interaction. Finally, in order to compare the effectiveness of this methodology with the homogenisation methods submitted during the HOME project, the obtained results were evaluated using the same performance metrics. This comparison opens new perspectives for the development of an innovative procedure based on the geostatistical stochastic approach. Acknowledgements: The authors gratefully acknowledge the financial support of "Fundação para a Ciência e Tecnologia" (FCT), Portugal, through the research project PTDC/GEO-MET/4026/2012 ("GSIMCLI - Geostatistical simulation with local distributions for the homogenization and interpolation of climate data").
Model evaluation using a community benchmarking system for land surface models
NASA Astrophysics Data System (ADS)
Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Kluzek, E. B.; Koven, C. D.; Randerson, J. T.
2014-12-01
Evaluation of atmosphere, ocean, sea ice, and land surface models is an important step in identifying deficiencies in Earth system models and developing improved estimates of future change. For the land surface and carbon cycle, the design of an open-source system has been an important objective of the International Land Model Benchmarking (ILAMB) project. Here we evaluated CMIP5 and CLM models using a benchmarking system that enables users to specify models, data sets, and scoring systems so that results can be tailored to specific model intercomparison projects. Our scoring system used information from four different aspects of global datasets, including climatological mean spatial patterns, seasonal cycle dynamics, interannual variability, and long-term trends. Variable-to-variable comparisons enable investigation of the mechanistic underpinnings of model behavior, and allow for some control of biases in model drivers. Graphics modules allow users to evaluate model performance at local, regional, and global scales. Use of modular structures makes it relatively easy for users to add new variables, diagnostic metrics, benchmarking datasets, or model simulations. Diagnostic results are automatically organized into HTML files, so users can conveniently share results with colleagues. We used this system to evaluate atmospheric carbon dioxide, burned area, global biomass and soil carbon stocks, net ecosystem exchange, gross primary production, ecosystem respiration, terrestrial water storage, evapotranspiration, and surface radiation from CMIP5 historical and ESM historical simulations. We found that the multi-model mean often performed better than many of the individual models for most variables. We plan to publicly release a stable version of the software during fall of 2014 that has land surface, carbon cycle, hydrology, radiation and energy cycle components.
Methodology and issues of integral experiments selection for nuclear data validation
NASA Astrophysics Data System (ADS)
Tatiana, Ivanova; Ivanov, Evgeny; Hill, Ian
2017-09-01
Nuclear data validation involves a large suite of Integral Experiments (IEs) for criticality, reactor physics and dosimetry applications. [1] Often benchmarks are taken from international Handbooks. [2, 3] Depending on the application, IEs have different degrees of usefulness in validation, and usually the use of a single benchmark is not advised; indeed, it may lead to erroneous interpretation and results. [1] This work aims at quantifying the importance of benchmarks used in application dependent cross section validation. The approach is based on well-known General Linear Least Squared Method (GLLSM) extended to establish biases and uncertainties for given cross sections (within a given energy interval). The statistical treatment results in a vector of weighting factors for the integral benchmarks. These factors characterize the value added by a benchmark for nuclear data validation for the given application. The methodology is illustrated by one example, selecting benchmarks for 239Pu cross section validation. The studies were performed in the framework of Subgroup 39 (Methods and approaches to provide feedback from nuclear and covariance data adjustment for improvement of nuclear data files) established at the Working Party on International Nuclear Data Evaluation Cooperation (WPEC) of the Nuclear Science Committee under the Nuclear Energy Agency (NEA/OECD).
Learning moment-based fast local binary descriptor
NASA Astrophysics Data System (ADS)
Bellarbi, Abdelkader; Zenati, Nadia; Otmane, Samir; Belghit, Hayet
2017-03-01
Recently, binary descriptors have attracted significant attention due to their speed and low memory consumption; however, using intensity differences to calculate the binary descriptive vector is not efficient enough. We propose an approach to binary description called POLAR_MOBIL, in which we perform binary tests between geometrical and statistical information using moments in the patch instead of the classical intensity binary test. In addition, we introduce a learning technique used to select an optimized set of binary tests with low correlation and high variance. This approach offers high distinctiveness against affine transformations and appearance changes. An extensive evaluation on well-known benchmark datasets reveals the robustness and the effectiveness of the proposed descriptor, as well as its good performance in terms of low computation complexity when compared with state-of-the-art real-time local descriptors.
Hagen, Espen; Ness, Torbjørn V; Khosrowshahi, Amir; Sørensen, Christina; Fyhn, Marianne; Hafting, Torkel; Franke, Felix; Einevoll, Gaute T
2015-04-30
New, silicon-based multielectrodes comprising hundreds or more electrode contacts offer the possibility to record spike trains from thousands of neurons simultaneously. This potential cannot be realized unless accurate, reliable automated methods for spike sorting are developed, in turn requiring benchmarking data sets with known ground-truth spike times. We here present a general simulation tool for computing benchmarking data for evaluation of spike-sorting algorithms entitled ViSAPy (Virtual Spiking Activity in Python). The tool is based on a well-established biophysical forward-modeling scheme and is implemented as a Python package built on top of the neuronal simulator NEURON and the Python tool LFPy. ViSAPy allows for arbitrary combinations of multicompartmental neuron models and geometries of recording multielectrodes. Three example benchmarking data sets are generated, i.e., tetrode and polytrode data mimicking in vivo cortical recordings and microelectrode array (MEA) recordings of in vitro activity in salamander retinas. The synthesized example benchmarking data mimics salient features of typical experimental recordings, for example, spike waveforms depending on interspike interval. ViSAPy goes beyond existing methods as it includes biologically realistic model noise, synaptic activation by recurrent spiking networks, finite-sized electrode contacts, and allows for inhomogeneous electrical conductivities. ViSAPy is optimized to allow for generation of long time series of benchmarking data, spanning minutes of biological time, by parallel execution on multi-core computers. ViSAPy is an open-ended tool as it can be generalized to produce benchmarking data or arbitrary recording-electrode geometries and with various levels of complexity. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Snow, Amie B.; Morris, Darrell; Perney, Jan
2018-01-01
We examined which of two instruments (Text Reading and Comprehension inventory [TRC] or a traditional informal reading inventory [IRI]) provides the more valid assessment of a primary-grade student's reading instructional level. The TRC is currently the required, benchmark reading assessment for students in grades K-3 in the state of North…
PSO algorithm enhanced with Lozi Chaotic Map - Tuning experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pluhacek, Michal; Senkerik, Roman; Zelinka, Ivan
2015-03-10
In this paper it is investigated the effect of tuning of control parameters of the Lozi Chaotic Map employed as a chaotic pseudo-random number generator for the particle swarm optimization algorithm. Three different benchmark functions are selected from the IEEE CEC 2013 competition benchmark set. The Lozi map is extensively tuned and the performance of PSO is evaluated.
Benchmarks for single-phase flow in fractured porous media
NASA Astrophysics Data System (ADS)
Flemisch, Bernd; Berre, Inga; Boon, Wietse; Fumagalli, Alessio; Schwenck, Nicolas; Scotti, Anna; Stefansson, Ivar; Tatomir, Alexandru
2018-01-01
This paper presents several test cases intended to be benchmarks for numerical schemes for single-phase fluid flow in fractured porous media. A number of solution strategies are compared, including a vertex and two cell-centred finite volume methods, a non-conforming embedded discrete fracture model, a primal and a dual extended finite element formulation, and a mortar discrete fracture model. The proposed benchmarks test the schemes by increasing the difficulties in terms of network geometry, e.g. intersecting fractures, and physical parameters, e.g. low and high fracture-matrix permeability ratio as well as heterogeneous fracture permeabilities. For each problem, the results presented are the number of unknowns, the approximation errors in the porous matrix and in the fractures with respect to a reference solution, and the sparsity and condition number of the discretized linear system. All data and meshes used in this study are publicly available for further comparisons.
Design and development of a community carbon cycle benchmarking system for CMIP5 models
NASA Astrophysics Data System (ADS)
Mu, M.; Hoffman, F. M.; Lawrence, D. M.; Riley, W. J.; Keppel-Aleks, G.; Randerson, J. T.
2013-12-01
Benchmarking has been widely used to assess the ability of atmosphere, ocean, sea ice, and land surface models to capture the spatial and temporal variability of observations during the historical period. For the carbon cycle and terrestrial ecosystems, the design and development of an open-source community platform has been an important goal as part of the International Land Model Benchmarking (ILAMB) project. Here we designed and developed a software system that enables the user to specify the models, benchmarks, and scoring systems so that results can be tailored to specific model intercomparison projects. We used this system to evaluate the performance of CMIP5 Earth system models (ESMs). Our scoring system used information from four different aspects of climate, including the climatological mean spatial pattern of gridded surface variables, seasonal cycle dynamics, the amplitude of interannual variability, and long-term decadal trends. We used this system to evaluate burned area, global biomass stocks, net ecosystem exchange, gross primary production, and ecosystem respiration from CMIP5 historical simulations. Initial results indicated that the multi-model mean often performed better than many of the individual models for most of the observational constraints.
NASA Astrophysics Data System (ADS)
Tragazikis, I. K.; Exarchos, D. A.; Dalla, P. T.; Matikas, T. E.
2016-04-01
This paper deals with the use of complimentary nondestructive methods for the evaluation of damage in engineering materials. The application of digital image correlation (DIC) to engineering materials is a useful tool for accurate, noncontact strain measurement. DIC is a 2D, full-field optical analysis technique based on gray-value digital images to measure deformation, vibration and strain a vast variety of materials. In addition, this technique can be applied from very small to large testing areas and can be used for various tests such as tensile, torsion and bending under static or dynamic loading. In this study, DIC results are benchmarked with other nondestructive techniques such as acoustic emission for damage localization and fracture mode evaluation, and IR thermography for stress field visualization and assessment. The combined use of these three nondestructive methods enables the characterization and classification of damage in materials and structures.
The Medical Library Association Benchmarking Network: development and implementation.
Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C; Smith, Bernie Todd
2006-04-01
This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program.
The Medical Library Association Benchmarking Network: development and implementation*
Dudden, Rosalind Farnam; Corcoran, Kate; Kaplan, Janice; Magouirk, Jeff; Rand, Debra C.; Smith, Bernie Todd
2006-01-01
Objective: This article explores the development and implementation of the Medical Library Association (MLA) Benchmarking Network from the initial idea and test survey, to the implementation of a national survey in 2002, to the establishment of a continuing program in 2004. Started as a program for hospital libraries, it has expanded to include other nonacademic health sciences libraries. Methods: The activities and timelines of MLA's Benchmarking Network task forces and editorial board from 1998 to 2004 are described. Results: The Benchmarking Network task forces successfully developed an extensive questionnaire with parameters of size and measures of library activity and published a report of the data collected by September 2002. The data were available to all MLA members in the form of aggregate tables. Utilization of Web-based technologies proved feasible for data intake and interactive display. A companion article analyzes and presents some of the data. MLA has continued to develop the Benchmarking Network with the completion of a second survey in 2004. Conclusions: The Benchmarking Network has provided many small libraries with comparative data to present to their administrators. It is a challenge for the future to convince all MLA members to participate in this valuable program. PMID:16636702
New Reactor Physics Benchmark Data in the March 2012 Edition of the IRPhEP Handbook
DOE Office of Scientific and Technical Information (OSTI.GOV)
John D. Bess; J. Blair Briggs; Jim Gulliford
2012-11-01
The International Reactor Physics Experiment Evaluation Project (IRPhEP) was established to preserve integral reactor physics experimental data, including separate or special effects data for nuclear energy and technology applications. Numerous experiments that have been performed worldwide, represent a large investment of infrastructure, expertise, and cost, and are valuable resources of data for present and future research. These valuable assets provide the basis for recording, development, and validation of methods. If the experimental data are lost, the high cost to repeat many of these measurements may be prohibitive. The purpose of the IRPhEP is to provide an extensively peer-reviewed set ofmore » reactor physics-related integral data that can be used by reactor designers and safety analysts to validate the analytical tools used to design next-generation reactors and establish the safety basis for operation of these reactors. Contributors from around the world collaborate in the evaluation and review of selected benchmark experiments for inclusion in the International Handbook of Evaluated Reactor Physics Benchmark Experiments (IRPhEP Handbook) [1]. Several new evaluations have been prepared for inclusion in the March 2012 edition of the IRPhEP Handbook.« less
Fisk-based criteria to support validation of detection methods for drinking water and air.
DOE Office of Scientific and Technical Information (OSTI.GOV)
MacDonell, M.; Bhattacharyya, M.; Finster, M.
2009-02-18
This report was prepared to support the validation of analytical methods for threat contaminants under the U.S. Environmental Protection Agency (EPA) National Homeland Security Research Center (NHSRC) program. It is designed to serve as a resource for certain applications of benchmark and fate information for homeland security threat contaminants. The report identifies risk-based criteria from existing health benchmarks for drinking water and air for potential use as validation targets. The focus is on benchmarks for chronic public exposures. The priority sources are standard EPA concentration limits for drinking water and air, along with oral and inhalation toxicity values. Many contaminantsmore » identified as homeland security threats to drinking water or air would convert to other chemicals within minutes to hours of being released. For this reason, a fate analysis has been performed to identify potential transformation products and removal half-lives in air and water so appropriate forms can be targeted for detection over time. The risk-based criteria presented in this report to frame method validation are expected to be lower than actual operational targets based on realistic exposures following a release. Note that many target criteria provided in this report are taken from available benchmarks without assessing the underlying toxicological details. That is, although the relevance of the chemical form and analogues are evaluated, the toxicological interpretations and extrapolations conducted by the authoring organizations are not. It is also important to emphasize that such targets in the current analysis are not health-based advisory levels to guide homeland security responses. This integrated evaluation of chronic public benchmarks and contaminant fate has identified more than 200 risk-based criteria as method validation targets across numerous contaminants and fate products in drinking water and air combined. The gap in directly applicable values is considerable across the full set of threat contaminants, so preliminary indicators were developed from other well-documented benchmarks to serve as a starting point for validation efforts. By this approach, at least preliminary context is available for water or air, and sometimes both, for all chemicals on the NHSRC list that was provided for this evaluation. This means that a number of concentrations presented in this report represent indirect measures derived from related benchmarks or surrogate chemicals, as described within the many results tables provided in this report.« less
ERIC Educational Resources Information Center
Raska, David
2014-01-01
This research explores and tests the effect of an innovative performance feedback practice--feedback supplemented with web-based peer benchmarking--through a lens of social cognitive framework for self-regulated learning. The results suggest that providing performance feedback with references to exemplary peer output is positively associated with…
Kiechle, Frederick L; Arcenas, Rodney C; Rogers, Linda C
2014-01-01
Benchmarks and metrics related to laboratory test utilization are based on evidence-based medical literature that may suffer from a positive publication bias. Guidelines are only as good as the data reviewed to create them. Disruptive technologies require time for appropriate use to be established before utilization review will be meaningful. Metrics include monitoring the use of obsolete tests and the inappropriate use of lab tests. Test utilization by clients in a hospital outreach program can be used to monitor the impact of new clients on lab workload. A multi-disciplinary laboratory utilization committee is the most effective tool for modifying bad habits, and reviewing and approving new tests for the lab formulary or by sending them out to a reference lab. Copyright © 2013 Elsevier B.V. All rights reserved.
Experimental Creep Life Assessment for the Advanced Stirling Convertor Heater Head
NASA Technical Reports Server (NTRS)
Krause, David L.; Kalluri, Sreeramesh; Shah, Ashwin R.; Korovaichuk, Igor
2010-01-01
The United States Department of Energy is planning to develop the Advanced Stirling Radioisotope Generator (ASRG) for the National Aeronautics and Space Administration (NASA) for potential use on future space missions. The ASRG provides substantial efficiency and specific power improvements over radioisotope power systems of heritage designs. The ASRG would use General Purpose Heat Source modules as energy sources and the free-piston Advanced Stirling Convertor (ASC) to convert heat into electrical energy. Lockheed Martin Corporation of Valley Forge, Pennsylvania, is integrating the ASRG systems, and Sunpower, Inc., of Athens, Ohio, is designing and building the ASC. NASA Glenn Research Center of Cleveland, Ohio, manages the Sunpower contract and provides technology development in several areas for the ASC. One area is reliability assessment for the ASC heater head, a critical pressure vessel within which heat is converted into mechanical oscillation of a displacer piston. For high system efficiency, the ASC heater head operates at very high temperature (850 C) and therefore is fabricated from an advanced heat-resistant nickel-based superalloy Microcast MarM-247. Since use of MarM-247 in a thin-walled pressure vessel is atypical, much effort is required to assure that the system will operate reliably for its design life of 17 years. One life-limiting structural response for this application is creep; creep deformation is the accumulation of time-dependent inelastic strain under sustained loading over time. If allowed to progress, the deformation eventually results in creep rupture. Since creep material properties are not available in the open literature, a detailed creep life assessment of the ASC heater head effort is underway. This paper presents an overview of that creep life assessment approach, including the reliability-based creep criteria developed from coupon testing, and the associated heater head deterministic and probabilistic analyses. The approach also includes direct benchmark experimental creep assessment. This element provides high-fidelity creep testing of prototypical heater head test articles to investigate the relevant material issues and multiaxial stress state. Benchmark testing provides required data to evaluate the complex life assessment methodology and to validate that analysis. Results from current benchmark heater head tests and newly developed experimental methods are presented. In the concluding remarks, the test results are shown to compare favorably with the creep strain predictions and are the first experimental evidence for a robust ASC heater head creep life.
Hot Cell Installation and Demonstration of the Severe Accident Test Station
DOE Office of Scientific and Technical Information (OSTI.GOV)
Linton, Kory D.; Burns, Zachary M.; Terrani, Kurt A.
A Severe Accident Test Station (SATS) capable of examining the oxidation kinetics and accident response of irradiated fuel and cladding materials for design basis accident (DBA) and beyond design basis accident (BDBA) scenarios has been successfully installed and demonstrated in the Irradiated Fuels Examination Laboratory (IFEL), a hot cell facility at Oak Ridge National Laboratory. The two test station modules provide various temperature profiles, steam, and the thermal shock conditions necessary for integral loss of coolant accident (LOCA) testing, defueled oxidation quench testing and high temperature BDBA testing. The installation of the SATS system restores the domestic capability to examinemore » postulated and extended LOCA conditions on spent fuel and cladding and provides a platform for evaluation of advanced fuel and accident tolerant fuel (ATF) cladding concepts. This document reports on the successful in-cell demonstration testing of unirradiated Zircaloy-4. It also contains descriptions of the integral test facility capabilities, installation activities, and out-of-cell benchmark testing to calibrate and optimize the system.« less
NASA Astrophysics Data System (ADS)
Michel, Dominik; Hirschi, Martin; Jimenez, Carlos; McCabe, Mathew; Miralles, Diego; Wood, Eric; Seneviratne, Sonia
2014-05-01
Research on climate variations and the development of predictive capabilities largely rely on globally available reference data series of the different components of the energy and water cycles. Several efforts aimed at producing large-scale and long-term reference data sets of these components, e.g. based on in situ observations and remote sensing, in order to allow for diagnostic analyses of the drivers of temporal variations in the climate system. Evapotranspiration (ET) is an essential component of the energy and water cycle, which can not be monitored directly on a global scale by remote sensing techniques. In recent years, several global multi-year ET data sets have been derived from remote sensing-based estimates, observation-driven land surface model simulations or atmospheric reanalyses. The LandFlux-EVAL initiative presented an ensemble-evaluation of these data sets over the time periods 1989-1995 and 1989-2005 (Mueller et al. 2013). Currently, a multi-decadal global reference heat flux data set for ET at the land surface is being developed within the LandFlux initiative of the Global Energy and Water Cycle Experiment (GEWEX). This LandFlux v0 ET data set comprises four ET algorithms forced with a common radiation and surface meteorology. In order to estimate the agreement of this LandFlux v0 ET data with existing data sets, it is compared to the recently available LandFlux-EVAL synthesis benchmark product. Additional evaluation of the LandFlux v0 ET data set is based on a comparison to in situ observations of a weighing lysimeter from the hydrological research site Rietholzbach in Switzerland. These analyses serve as a test bed for similar evaluation procedures that are envisaged for ESA's WACMOS-ET initiative (http://wacmoset.estellus.eu). Reference: Mueller, B., Hirschi, M., Jimenez, C., Ciais, P., Dirmeyer, P. A., Dolman, A. J., Fisher, J. B., Jung, M., Ludwig, F., Maignan, F., Miralles, D. G., McCabe, M. F., Reichstein, M., Sheffield, J., Wang, K., Wood, E. F., Zhang, Y., and Seneviratne, S. I. (2013). Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis. Hydrology and Earth System Sciences, 17(10): 3707-3720.
Tsatsaronis, George; Balikas, Georgios; Malakasiotis, Prodromos; Partalas, Ioannis; Zschunke, Matthias; Alvers, Michael R; Weissenborn, Dirk; Krithara, Anastasia; Petridis, Sergios; Polychronopoulos, Dimitris; Almirantis, Yannis; Pavlopoulos, John; Baskiotis, Nicolas; Gallinari, Patrick; Artiéres, Thierry; Ngomo, Axel-Cyrille Ngonga; Heino, Norman; Gaussier, Eric; Barrio-Alvers, Liliana; Schroeder, Michael; Androutsopoulos, Ion; Paliouras, Georgios
2015-04-30
This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.
Reactor Pressure Vessel Fracture Analysis Capabilities in Grizzly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spencer, Benjamin; Backman, Marie; Chakraborty, Pritam
2015-03-01
Efforts have been underway to develop fracture mechanics capabilities in the Grizzly code to enable it to be used to perform deterministic fracture assessments of degraded reactor pressure vessels (RPVs). Development in prior years has resulted a capability to calculate -integrals. For this application, these are used to calculate stress intensity factors for cracks to be used in deterministic linear elastic fracture mechanics (LEFM) assessments of fracture in degraded RPVs. The -integral can only be used to evaluate stress intensity factors for axis-aligned flaws because it can only be used to obtain the stress intensity factor for pure Mode Imore » loading. Off-axis flaws will be subjected to mixed-mode loading. For this reason, work has continued to expand the set of fracture mechanics capabilities to permit it to evaluate off-axis flaws. This report documents the following work to enhance Grizzly’s engineering fracture mechanics capabilities for RPVs: • Interaction Integral and -stress: To obtain mixed-mode stress intensity factors, a capability to evaluate interaction integrals for 2D or 3D flaws has been developed. A -stress evaluation capability has been developed to evaluate the constraint at crack tips in 2D or 3D. Initial verification testing of these capabilities is documented here. • Benchmarking for axis-aligned flaws: Grizzly’s capabilities to evaluate stress intensity factors for axis-aligned flaws have been benchmarked against calculations for the same conditions in FAVOR. • Off-axis flaw demonstration: The newly-developed interaction integral capabilities are demon- strated in an application to calculate the mixed-mode stress intensity factors for off-axis flaws. • Other code enhancements: Other enhancements to the thermomechanics capabilities that relate to the solution of the engineering RPV fracture problem are documented here.« less
Benchmarking an unstructured grid sediment model in an energetic estuary
Lopez, Jesse E.; Baptista, António M.
2016-12-14
A sediment model coupled to the hydrodynamic model SELFE is validated against a benchmark combining a set of idealized tests and an application to a field-data rich energetic estuary. After sensitivity studies, model results for the idealized tests largely agree with previously reported results from other models in addition to analytical, semi-analytical, or laboratory results. Results of suspended sediment in an open channel test with fixed bottom are sensitive to turbulence closure and treatment for hydrodynamic bottom boundary. Results for the migration of a trench are very sensitive to critical stress and erosion rate, but largely insensitive to turbulence closure.more » The model is able to qualitatively represent sediment dynamics associated with estuarine turbidity maxima in an idealized estuary. Applied to the Columbia River estuary, the model qualitatively captures sediment dynamics observed by fixed stations and shipborne profiles. Representation of the vertical structure of suspended sediment degrades when stratification is underpredicted. Across all tests, skill metrics of suspended sediments lag those of hydrodynamics even when qualitatively representing dynamics. The benchmark is fully documented in an openly available repository to encourage unambiguous comparisons against other models.« less
Benchmark Dataset for Whole Genome Sequence Compression.
C L, Biji; S Nair, Achuthsankar
2017-01-01
The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
Local implementation of the Essence of Care benchmarks.
Jones, Sue
To understand clinical practice benchmarking from the perspective of nurses working in a large acute NHS trust and to determine whether the nurses perceived that their commitment to Essence of Care led to improvements in care, the factors that influenced their role in the process and the organisational factors that influenced benchmarking. An ethnographic case study approach was adopted. Six themes emerged from the data. Two organisational issues emerged: leadership and the values and/or culture of the organisation. The findings suggested that the leadership ability of the Essence of Care link nurses and the value placed on this work by the organisation were key to the success of benchmarking. A model for successful implementation of the Essence of Care is proposed based on the findings of this study, which lends itself to testing by other organisations.
Evaluation of FSK models for radiative heat transfer under oxyfuel conditions
NASA Astrophysics Data System (ADS)
Clements, Alastair G.; Porter, Rachael; Pranzitelli, Alessandro; Pourkashanian, Mohamed
2015-01-01
Oxyfuel is a promising technology for carbon capture and storage (CCS) applied to combustion processes. It would be highly advantageous in the deployment of CCS to be able to model and optimise oxyfuel combustion, however the increased concentrations of CO2 and H2O under oxyfuel conditions modify several fundamental processes of combustion, including radiative heat transfer. This study uses benchmark narrow band radiation models to evaluate the influence of assumptions in global full-spectrum k-distribution (FSK) models, and whether they are suitable for modelling radiation in computational fluid dynamics (CFD) calculations of oxyfuel combustion. The statistical narrow band (SNB) and correlated-k (CK) models are used to calculate benchmark data for the radiative source term and heat flux, which are then compared to the results calculated from FSK models. Both the full-spectrum correlated k (FSCK) and the full-spectrum scaled k (FSSK) models are applied using up-to-date spectral data. The results show that the FSCK and FSSK methods achieve good agreement in the test cases. The FSCK method using a five-point Gauss quadrature scheme is recommended for CFD calculations in oxyfuel conditions, however there are still potential inaccuracies in cases with very wide variations in the ratio between CO2 and H2O concentrations.
An open source framework for tracking and state estimation ('Stone Soup')
NASA Astrophysics Data System (ADS)
Thomas, Paul A.; Barr, Jordi; Balaji, Bhashyam; White, Kruger
2017-05-01
The ability to detect and unambiguously follow all moving entities in a state-space is important in multiple domains both in defence (e.g. air surveillance, maritime situational awareness, ground moving target indication) and the civil sphere (e.g. astronomy, biology, epidemiology, dispersion modelling). However, tracking and state estimation researchers and practitioners have difficulties recreating state-of-the-art algorithms in order to benchmark their own work. Furthermore, system developers need to assess which algorithms meet operational requirements objectively and exhaustively rather than intuitively or driven by personal favourites. We have therefore commenced the development of a collaborative initiative to create an open source framework for production, demonstration and evaluation of Tracking and State Estimation algorithms. The initiative will develop a (MIT-licensed) software platform for researchers and practitioners to test, verify and benchmark a variety of multi-sensor and multi-object state estimation algorithms. The initiative is supported by four defence laboratories, who will contribute to the development effort for the framework. The tracking and state estimation community will derive significant benefits from this work, including: access to repositories of verified and validated tracking and state estimation algorithms, a framework for the evaluation of multiple algorithms, standardisation of interfaces and access to challenging data sets. Keywords: Tracking,
NASA Astrophysics Data System (ADS)
Izah Anuar, Nurul; Saptari, Adi
2016-02-01
This paper addresses the types of particle representation (encoding) procedures in a population-based stochastic optimization technique in solving scheduling problems known in the job-shop manufacturing environment. It intends to evaluate and compare the performance of different particle representation procedures in Particle Swarm Optimization (PSO) in the case of solving Job-shop Scheduling Problems (JSP). Particle representation procedures refer to the mapping between the particle position in PSO and the scheduling solution in JSP. It is an important step to be carried out so that each particle in PSO can represent a schedule in JSP. Three procedures such as Operation and Particle Position Sequence (OPPS), random keys representation and random-key encoding scheme are used in this study. These procedures have been tested on FT06 and FT10 benchmark problems available in the OR-Library, where the objective function is to minimize the makespan by the use of MATLAB software. Based on the experimental results, it is discovered that OPPS gives the best performance in solving both benchmark problems. The contribution of this paper is the fact that it demonstrates to the practitioners involved in complex scheduling problems that different particle representation procedures can have significant effects on the performance of PSO in solving JSP.
A homology-based pipeline for global prediction of post-translational modification sites
NASA Astrophysics Data System (ADS)
Chen, Xiang; Shi, Shao-Ping; Xu, Hao-Dong; Suo, Sheng-Bao; Qiu, Jian-Ding
2016-05-01
The pathways of protein post-translational modifications (PTMs) have been shown to play particularly important roles for almost any biological process. Identification of PTM substrates along with information on the exact sites is fundamental for fully understanding or controlling biological processes. Alternative computational strategies would help to annotate PTMs in a high-throughput manner. Traditional algorithms are suited for identifying the common organisms and tissues that have a complete PTM atlas or extensive experimental data. While annotation of rare PTMs in most organisms is a clear challenge. In this work, to this end we have developed a novel homology-based pipeline named PTMProber that allows identification of potential modification sites for most of the proteomes lacking PTMs data. Cross-promotion E-value (CPE) as stringent benchmark has been used in our pipeline to evaluate homology to known modification sites. Independent-validation tests show that PTMProber achieves over 58.8% recall with high precision by CPE benchmark. Comparisons with other machine-learning tools show that PTMProber pipeline performs better on general predictions. In addition, we developed a web-based tool to integrate this pipeline at http://bioinfo.ncu.edu.cn/PTMProber/index.aspx. In addition to pre-constructed prediction models of PTM, the website provides an extensional functionality to allow users to customize models.
Random Forests for Global and Regional Crop Yield Predictions.
Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung
2016-01-01
Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.
Dee, C R; Rankin, J A; Burns, C A
1998-07-01
Journal usage studies, which are useful for budget management and for evaluating collection performance relative to library use, have generally described a single library or subject discipline. The Southern Chapter/Medical Library Association (SC/MLA) study has examined journal usage at the aggregate data level with the long-term goal of developing hospital library benchmarks for journal use. Thirty-six SC/MLA hospital libraries, categorized for the study by size as small, medium, or large, reported current journal title use centrally for a one-year period following standardized data collection procedures. Institutional and aggregate data were analyzed for the average annual frequency of use, average costs per use and non-use, and average percent of non-used titles. Permutation F-type tests were used to measure difference among the three hospital groups. Averages were reported for each data set analysis. Statistical tests indicated no significant differences between the hospital groups, suggesting that benchmarks can be derived applying to all types of hospital libraries. The unanticipated lack of commonality among heavily used titles pointed to a need for uniquely tailored collections. Although the small sample size precluded definitive results, the study's findings constituted a baseline of data that can be compared against future studies.
Dee, C R; Rankin, J A; Burns, C A
1998-01-01
BACKGROUND: Journal usage studies, which are useful for budget management and for evaluating collection performance relative to library use, have generally described a single library or subject discipline. The Southern Chapter/Medical Library Association (SC/MLA) study has examined journal usage at the aggregate data level with the long-term goal of developing hospital library benchmarks for journal use. METHODS: Thirty-six SC/MLA hospital libraries, categorized for the study by size as small, medium, or large, reported current journal title use centrally for a one-year period following standardized data collection procedures. Institutional and aggregate data were analyzed for the average annual frequency of use, average costs per use and non-use, and average percent of non-used titles. Permutation F-type tests were used to measure difference among the three hospital groups. RESULTS: Averages were reported for each data set analysis. Statistical tests indicated no significant differences between the hospital groups, suggesting that benchmarks can be derived applying to all types of hospital libraries. The unanticipated lack of commonality among heavily used titles pointed to a need for uniquely tailored collections. CONCLUSION: Although the small sample size precluded definitive results, the study's findings constituted a baseline of data that can be compared against future studies. PMID:9681164
Lourenço, J; Marques, S; Carvalho, F P; Oliveira, J; Malta, M; Santos, M; Gonçalves, F; Pereira, R; Mendo, S
2017-12-15
Active and abandoned uranium mining sites often create environmentally problematic situations, since they cause the contamination of all environmental matrices (air, soil and water) with stable metals and radionuclides. Due to their cytotoxic, genotoxic and teratogenic properties, the exposure to these contaminants may cause several harmful effects in living organisms. The Fish Embryo Acute Toxicity Test (FET) test was employed to evaluate the genotoxic and teratogenic potential of mine liquid effluents and sludge elutriates from a deactivated uranium mine. The aims were: a) to determine the risk of discharge of such wastes in the environment; b) the effectiveness of the chemical treatment applied to the uranium mine water, which is a standard procedure generally applied to liquid effluents from uranium mines and mills, to reduce its toxicological potential; c) the suitability of the FET test for the evaluation the toxicity of such wastes and the added value of including the evaluation of genotoxicity. Results showed that through the FET test it was possible to determine that both elutriates and effluents are genotoxic and also that the mine effluent is teratogenic at low concentrations. Additionally, liquid effluents and sludge elutriates affect other parameters namely, growth and hatching and that water pH alone played an important role in the hatching process. The inclusion of genotoxicity evaluation in the FET test was crucial to prevent the underestimation of the risks posed by some of the tested effluents/elutriates. Finally, it was possible to conclude that care should be taken when using benchmark values calculated for specific stressors to evaluate the risk posed by uranium mining wastes to freshwater ecosystems, due to their chemical complexity. Copyright © 2017 Elsevier B.V. All rights reserved.
Constructing Benchmark Databases and Protocols for Medical Image Analysis: Diabetic Retinopathy
Kauppi, Tomi; Kämäräinen, Joni-Kristian; Kalesnykiene, Valentina; Sorri, Iiris; Uusitalo, Hannu; Kälviäinen, Heikki
2013-01-01
We address the performance evaluation practices for developing medical image analysis methods, in particular, how to establish and share databases of medical images with verified ground truth and solid evaluation protocols. Such databases support the development of better algorithms, execution of profound method comparisons, and, consequently, technology transfer from research laboratories to clinical practice. For this purpose, we propose a framework consisting of reusable methods and tools for the laborious task of constructing a benchmark database. We provide a software tool for medical image annotation helping to collect class label, spatial span, and expert's confidence on lesions and a method to appropriately combine the manual segmentations from multiple experts. The tool and all necessary functionality for method evaluation are provided as public software packages. As a case study, we utilized the framework and tools to establish the DiaRetDB1 V2.1 database for benchmarking diabetic retinopathy detection algorithms. The database contains a set of retinal images, ground truth based on information from multiple experts, and a baseline algorithm for the detection of retinopathy lesions. PMID:23956787
Beauchamp, Kyle A; Behr, Julie M; Rustenburg, Ariën S; Bayly, Christopher I; Kroenlein, Kenneth; Chodera, John D
2015-10-08
Atomistic molecular simulations are a powerful way to make quantitative predictions, but the accuracy of these predictions depends entirely on the quality of the force field employed. Although experimental measurements of fundamental physical properties offer a straightforward approach for evaluating force field quality, the bulk of this information has been tied up in formats that are not machine-readable. Compiling benchmark data sets of physical properties from non-machine-readable sources requires substantial human effort and is prone to the accumulation of human errors, hindering the development of reproducible benchmarks of force-field accuracy. Here, we examine the feasibility of benchmarking atomistic force fields against the NIST ThermoML data archive of physicochemical measurements, which aggregates thousands of experimental measurements in a portable, machine-readable, self-annotating IUPAC-standard format. As a proof of concept, we present a detailed benchmark of the generalized Amber small-molecule force field (GAFF) using the AM1-BCC charge model against experimental measurements (specifically, bulk liquid densities and static dielectric constants at ambient pressure) automatically extracted from the archive and discuss the extent of data available for use in larger scale (or continuously performed) benchmarks. The results of even this limited initial benchmark highlight a general problem with fixed-charge force fields in the representation low-dielectric environments, such as those seen in binding cavities or biological membranes.
Developing a Benchmarking Process in Perfusion: A Report of the Perfusion Downunder Collaboration
Baker, Robert A.; Newland, Richard F.; Fenton, Carmel; McDonald, Michael; Willcox, Timothy W.; Merry, Alan F.
2012-01-01
Abstract: Improving and understanding clinical practice is an appropriate goal for the perfusion community. The Perfusion Downunder Collaboration has established a multi-center perfusion focused database aimed at achieving these goals through the development of quantitative quality indicators for clinical improvement through benchmarking. Data were collected using the Perfusion Downunder Collaboration database from procedures performed in eight Australian and New Zealand cardiac centers between March 2007 and February 2011. At the Perfusion Downunder Meeting in 2010, it was agreed by consensus, to report quality indicators (QI) for glucose level, arterial outlet temperature, and pCO2 management during cardiopulmonary bypass. The values chosen for each QI were: blood glucose ≥4 mmol/L and ≤10 mmol/L; arterial outlet temperature ≤37°C; and arterial blood gas pCO2 ≥ 35 and ≤45 mmHg. The QI data were used to derive benchmarks using the Achievable Benchmark of Care (ABC™) methodology to identify the incidence of QIs at the best performing centers. Five thousand four hundred and sixty-five procedures were evaluated to derive QI and benchmark data. The incidence of the blood glucose QI ranged from 37–96% of procedures, with a benchmark value of 90%. The arterial outlet temperature QI occurred in 16–98% of procedures with the benchmark of 94%; while the arterial pCO2 QI occurred in 21–91%, with the benchmark value of 80%. We have derived QIs and benchmark calculations for the management of several key aspects of cardiopulmonary bypass to provide a platform for improving the quality of perfusion practice. PMID:22730861
NASA Technical Reports Server (NTRS)
Bogart, D. D.; Shook, D. F.; Fieno, D.
1973-01-01
Integral tests of evaluated ENDF/B high-energy cross sections have been made by comparing measured and calculated neutron leakage flux spectra from spheres of various materials. An Am-Be (alpha,n) source was used to provide fast neutrons at the center of the test spheres of Be, CH2, Pb, Nb, Mo, Ta, and W. The absolute leakage flux spectra were measured in the energy range 0.5 to 12 MeV using a calibrated NE213 liquid scintillator neutron spectrometer. Absolute calculations of the spectra were made using version 3 ENDF/B cross sections and an S sub n discrete ordinates multigroup transport code. Generally excellent agreement was obtained for Be, CH2, Pb, and Mo, and good agreement was observed for Nb although discrepancies were observed for some energy ranges. Poor comparative results, obtained for Ta and W, are attributed to unsatisfactory nonelastic cross sections. The experimental sphere leakage flux spectra are tabulated and serve as possible benchmarks for these elements against which reevaluated cross sections may be tested.
EPA Corporate GHG Goal Evaluation Model
The EPA Corporate GHG Goal Evaluation Model provides companies with a transparent and publicly available benchmarking resource to help evaluate and establish new or existing GHG goals that go beyond business as usual for their individual sectors.